Resampling data — resample • rOstluft

Aggregate data by different time periods. Following this simple steps:

split data in series
- pad data serie (needed for calculation of capture threshold, detection of gaps)
- group serie by new interval with lubridate::floor_date()
- apply statistical method or user provides function (user can provide list per parameter)
combine resampled series

It is possible to supply different methods for different parameters. The argument statistic can be named list. The name stands for the parameter. The value can be a function to apply, a name of method or a list of names. Some methods renames the parameter and changes the unit. A list of method names can only contain one non renaming method.

Usage

resample(
  data,
  statistic = "mean",
  new_interval,
  data_thresh = NULL,
  max_gap = NULL,
  rename_parameter = TRUE,
  percentile = 0.95,
  skip_padding = FALSE,
  start_date = NULL,
  end_date = NULL,
  drop_last = FALSE
)

Arguments

data: A tibble in rOstluft long format
statistic: Statistical method(s) to apply when aggregating the data. Can be a simple string with name of the method or a function with one argument. Or a list with parameter as name and the statistical method as value (function or name of method). Or a list with parameter as and a list of statisticals methods. All methods must support renaming parameter. A default statistic for all parameters not in the list, can be defined with the name "default_statistic". See section Statistical methods and examples
new_interval: New interval. Must be longer than actual interval (not checked)
data_thresh: optional minimum data capture threshold in to use
max_gap: optional maxium Number of consecutive NA values
rename_parameter: optional rename parameter
percentile: The percentile level used when statistic = "percentile". The default is 0.95
skip_padding: don't pad the data before applying statistics. Default FALSE
start_date: optional start date for padding. Default min date in series floored to the new interval
end_date: optional end date for padding. Default max date in series ceiled to the new interval
drop_last: optional drop the last added time point by padding. Default False, true if no end_date provided and max date != ceiled max date.

Value

tibble with resampled data

Statistical methods

The statistical method is a function with a numeric vector as argument and returns a single value.

"mean" average value
"median" median value
"sd" standard deviation of values
"sum" sum over all values
"max" maxium value
"min" minimum value
"n" number of valid records, renames parameter, changes unit
"coverage" percentage of valid records, renames parameter, changes unit
"percentile" calculates the percentile. Use the argument percentile to specify the level, renames parameter
"perc95" 95% percentile, renames parameter
"perc98" 98% percentile, renames parameter
"n>5" number of values > 5 (WHO PM2.5 y1 limit), renames parameter, changes unit
"n>8" number of values > 8 (CO d1 limit), renames parameter, changes unit
"n>10" number of values > 10 (PM2.5 y1 limit), renames parameter, changes unit
"n>15" number of values > 15 (WHO PM10 limit), renames parameter, changes unit
"n>25" number of values > 25 (WHO NO2 d1 limit), renames parameter, changes unit
"n>30" number of values > 30 (NO2, SO2 y1 limit), renames parameter, changes unit
"n>40" number of values > 40 (WHO SO2 d1 limit), renames parameter, changes unit
"n>45" number of values > 45 (WHO PM10 d1 limit), renames parameter, changes unit
"n>50" number of values > 50 (PM10 d1 limit), renames parameter, changes unit
"n>60" number of values > 60 (y1 limit), renames parameter, changes unit
"n>65" number of values > 65 (O3 d1 indicator), renames parameter, changes unit
"n>80" number of values > 80 (NO2 d1 limit), renames parameter, changes unit
"n>100" number of values > 100 (SO2 d1 limit), renames parameter, changes unit
"n>120" number of values > 120 (O3 h1 limit), renames parameter, changes unit
"n>160" number of values > 160 (O3 h1 indicator), renames parameter, changes unit
"n>180" number of values > 180 (O3 h1 indicator), renames parameter, changes unit
"n>200" number of values > 200 (O3 h1 indicator), renames parameter, changes unit
"n>240" number of values > 240 (O3 h1 indicator), renames parameter, changes unit
"drop" drops the parameter from the result, useful for persons too lazy to filter the input data

Wind

Wind is a special case. For vector averaging the methods needs two inputs (direction and speed). To resample wind data it is necessary to specify three parameters with the methods "wind.direction", "wind.speed_vector" and "wind.speed_scalar". Even if scalar or vector speed isn't present. The parameter will be substituted by the other.

Important: Wind calculation are standalone. It is possible to calculate multiple methods for non wind parameters.

TODO

AOT40 statistic?
some from https://github.com/davidcarslaw/openair/blob/master/R/aqStats.R?

Examples

min30 <- system.file("extdata", "Zch_Stampfenbachstrasse_min30_2013_Jan.csv",
                     package = "rOstluft.data", mustWork = TRUE)

airmo_min30 <- read_airmo_csv(min30)

# filter volume concenctrations, only use mass concentrations
airmo_min30 <- dplyr::filter(airmo_min30, !(.data$unit == "ppb" | .data$unit == "ppm"))

d1_statistics <- list(
  "default_statistic" = "drop",
  "Hr" = "mean",
  "RainDur" = "sum",
  "O3" = list("mean", "max", "min", "n")
)
resample(airmo_min30, d1_statistics, "d1", data_thresh = 0.8)
#> # A tibble: 186 × 6
#>    starttime           site                    parameter interval unit  value
#>    <dttm>              <fct>                   <fct>     <fct>    <fct> <dbl>
#>  1 2013-01-01 00:00:00 Zch_Stampfenbachstrasse Hr        d1       %Hr    76.0
#>  2 2013-01-02 00:00:00 Zch_Stampfenbachstrasse Hr        d1       %Hr    83.5
#>  3 2013-01-03 00:00:00 Zch_Stampfenbachstrasse Hr        d1       %Hr    78.0
#>  4 2013-01-04 00:00:00 Zch_Stampfenbachstrasse Hr        d1       %Hr    83.5
#>  5 2013-01-05 00:00:00 Zch_Stampfenbachstrasse Hr        d1       %Hr    83.8
#>  6 2013-01-06 00:00:00 Zch_Stampfenbachstrasse Hr        d1       %Hr    82.3
#>  7 2013-01-07 00:00:00 Zch_Stampfenbachstrasse Hr        d1       %Hr    84.9
#>  8 2013-01-08 00:00:00 Zch_Stampfenbachstrasse Hr        d1       %Hr    79.0
#>  9 2013-01-09 00:00:00 Zch_Stampfenbachstrasse Hr        d1       %Hr    89.4
#> 10 2013-01-10 00:00:00 Zch_Stampfenbachstrasse Hr        d1       %Hr    89.6
#> # ℹ 176 more rows

# Note: wind parameters don't support multiple methods via list!
h1_statistics <- list(
  "default_statistic" = "drop",
  "WD" = "wind.direction",
  "WVs" = "wind.speed_scalar",
  "WVv" = "wind.speed_vector",
  "RainDur" = "sum",
  "NO" = list("coverage", "mean")
)
resample(airmo_min30, h1_statistics, "h1", data_thresh = 0.8)
#> # A tibble: 4,464 × 6
#>    starttime           site                    parameter    interval unit  value
#>    <dttm>              <fct>                   <fct>        <fct>    <fct> <dbl>
#>  1 2013-01-01 00:00:00 Zch_Stampfenbachstrasse NO_valid%_m… h1       %       100
#>  2 2013-01-01 01:00:00 Zch_Stampfenbachstrasse NO_valid%_m… h1       %       100
#>  3 2013-01-01 02:00:00 Zch_Stampfenbachstrasse NO_valid%_m… h1       %       100
#>  4 2013-01-01 03:00:00 Zch_Stampfenbachstrasse NO_valid%_m… h1       %       100
#>  5 2013-01-01 04:00:00 Zch_Stampfenbachstrasse NO_valid%_m… h1       %       100
#>  6 2013-01-01 05:00:00 Zch_Stampfenbachstrasse NO_valid%_m… h1       %       100
#>  7 2013-01-01 06:00:00 Zch_Stampfenbachstrasse NO_valid%_m… h1       %       100
#>  8 2013-01-01 07:00:00 Zch_Stampfenbachstrasse NO_valid%_m… h1       %       100
#>  9 2013-01-01 08:00:00 Zch_Stampfenbachstrasse NO_valid%_m… h1       %       100
#> 10 2013-01-01 09:00:00 Zch_Stampfenbachstrasse NO_valid%_m… h1       %       100
#> # ℹ 4,454 more rows

# Note: all resulting values should be NA -> gap is to big (480 * min30 = 10 days)
y1_statistics <- list(
  "default_statistic" = "drop",
  "O3" = list("mean", "perc98", "n", "max", "min")
)
resample(airmo_min30, y1_statistics, "y1", max_gap = 480)
#> # A tibble: 5 × 6
#>   starttime           site                    parameter    interval unit  value
#>   <dttm>              <fct>                   <fct>        <fct>    <fct> <dbl>
#> 1 2013-01-01 00:00:00 Zch_Stampfenbachstrasse O3           y1       µg/m3    NA
#> 2 2013-01-01 00:00:00 Zch_Stampfenbachstrasse O3_98%_min30 y1       µg/m3    NA
#> 3 2013-01-01 00:00:00 Zch_Stampfenbachstrasse O3_nb_min30  y1       1        NA
#> 4 2013-01-01 00:00:00 Zch_Stampfenbachstrasse O3_max_min30 y1       µg/m3    NA
#> 5 2013-01-01 00:00:00 Zch_Stampfenbachstrasse O3_min_min30 y1       µg/m3    NA