Aggregate data by different time periods. Following this simple steps:
split data in series
pad data serie (needed for calculation of capture threshold, detection of gaps)
group serie by new interval with
lubridate::floor_date()
apply statistical method or user provides function (user can provide list per parameter)
combine resampled series
It is possible to supply different methods for different parameters. The argument statistic can be named list. The name stands for the parameter. The value can be a function to apply, a name of method or a list of names. Some methods renames the parameter and changes the unit. A list of method names can only contain one non renaming method.
Usage
resample(
data,
statistic = "mean",
new_interval,
data_thresh = NULL,
max_gap = NULL,
rename_parameter = TRUE,
percentile = 0.95,
skip_padding = FALSE,
start_date = NULL,
end_date = NULL,
drop_last = FALSE
)
Arguments
- data
A tibble in rOstluft long format
- statistic
Statistical method(s) to apply when aggregating the data. Can be a simple string with name of the method or a function with one argument. Or a list with parameter as name and the statistical method as value (function or name of method). Or a list with parameter as and a list of statisticals methods. All methods must support renaming parameter. A default statistic for all parameters not in the list, can be defined with the name "default_statistic". See section Statistical methods and examples
- new_interval
New interval. Must be longer than actual interval (not checked)
- data_thresh
optional minimum data capture threshold in to use
- max_gap
optional maxium Number of consecutive NA values
- rename_parameter
optional rename parameter
- percentile
The percentile level used when statistic = "percentile". The default is 0.95
- skip_padding
don't pad the data before applying statistics. Default FALSE
- start_date
optional start date for padding. Default min date in series floored to the new interval
- end_date
optional end date for padding. Default max date in series ceiled to the new interval
- drop_last
optional drop the last added time point by padding. Default False, true if no end_date provided and max date != ceiled max date.
Statistical methods
The statistical method is a function with a numeric vector as argument and returns a single value.
"mean"
average value"median"
median value"sd"
standard deviation of values"sum"
sum over all values"max"
maxium value"min"
minimum value"n"
number of valid records, renames parameter, changes unit"coverage"
percentage of valid records, renames parameter, changes unit"percentile"
calculates the percentile. Use the argument percentile to specify the level, renames parameter"perc95"
95% percentile, renames parameter"perc98"
98% percentile, renames parameter"n>5"
number of values > 5 (WHO PM2.5 y1 limit), renames parameter, changes unit"n>8"
number of values > 8 (CO d1 limit), renames parameter, changes unit"n>10"
number of values > 10 (PM2.5 y1 limit), renames parameter, changes unit"n>15"
number of values > 15 (WHO PM10 limit), renames parameter, changes unit"n>25"
number of values > 25 (WHO NO2 d1 limit), renames parameter, changes unit"n>30"
number of values > 30 (NO2, SO2 y1 limit), renames parameter, changes unit"n>40"
number of values > 40 (WHO SO2 d1 limit), renames parameter, changes unit"n>45"
number of values > 45 (WHO PM10 d1 limit), renames parameter, changes unit"n>50"
number of values > 50 (PM10 d1 limit), renames parameter, changes unit"n>60"
number of values > 60 (y1 limit), renames parameter, changes unit"n>65"
number of values > 65 (O3 d1 indicator), renames parameter, changes unit"n>80"
number of values > 80 (NO2 d1 limit), renames parameter, changes unit"n>100"
number of values > 100 (SO2 d1 limit), renames parameter, changes unit"n>120"
number of values > 120 (O3 h1 limit), renames parameter, changes unit"n>160"
number of values > 160 (O3 h1 indicator), renames parameter, changes unit"n>180"
number of values > 180 (O3 h1 indicator), renames parameter, changes unit"n>200"
number of values > 200 (O3 h1 indicator), renames parameter, changes unit"n>240"
number of values > 240 (O3 h1 indicator), renames parameter, changes unit"drop"
drops the parameter from the result, useful for persons too lazy to filter the input data
Wind
Wind is a special case. For vector averaging the methods needs two inputs (direction and speed). To resample wind
data it is necessary to specify three parameters with the methods "wind.direction"
, "wind.speed_vector"
and
"wind.speed_scalar"
. Even if scalar or vector speed isn't present. The parameter will be substituted by the other.
Important: Wind calculation are standalone. It is possible to calculate multiple methods for non wind parameters.
Examples
min30 <- system.file("extdata", "Zch_Stampfenbachstrasse_min30_2013_Jan.csv",
package = "rOstluft.data", mustWork = TRUE)
airmo_min30 <- read_airmo_csv(min30)
# filter volume concenctrations, only use mass concentrations
airmo_min30 <- dplyr::filter(airmo_min30, !(.data$unit == "ppb" | .data$unit == "ppm"))
d1_statistics <- list(
"default_statistic" = "drop",
"Hr" = "mean",
"RainDur" = "sum",
"O3" = list("mean", "max", "min", "n")
)
resample(airmo_min30, d1_statistics, "d1", data_thresh = 0.8)
#> # A tibble: 186 × 6
#> starttime site parameter interval unit value
#> <dttm> <fct> <fct> <fct> <fct> <dbl>
#> 1 2013-01-01 00:00:00 Zch_Stampfenbachstrasse Hr d1 %Hr 76.0
#> 2 2013-01-02 00:00:00 Zch_Stampfenbachstrasse Hr d1 %Hr 83.5
#> 3 2013-01-03 00:00:00 Zch_Stampfenbachstrasse Hr d1 %Hr 78.0
#> 4 2013-01-04 00:00:00 Zch_Stampfenbachstrasse Hr d1 %Hr 83.5
#> 5 2013-01-05 00:00:00 Zch_Stampfenbachstrasse Hr d1 %Hr 83.8
#> 6 2013-01-06 00:00:00 Zch_Stampfenbachstrasse Hr d1 %Hr 82.3
#> 7 2013-01-07 00:00:00 Zch_Stampfenbachstrasse Hr d1 %Hr 84.9
#> 8 2013-01-08 00:00:00 Zch_Stampfenbachstrasse Hr d1 %Hr 79.0
#> 9 2013-01-09 00:00:00 Zch_Stampfenbachstrasse Hr d1 %Hr 89.4
#> 10 2013-01-10 00:00:00 Zch_Stampfenbachstrasse Hr d1 %Hr 89.6
#> # ℹ 176 more rows
# Note: wind parameters don't support multiple methods via list!
h1_statistics <- list(
"default_statistic" = "drop",
"WD" = "wind.direction",
"WVs" = "wind.speed_scalar",
"WVv" = "wind.speed_vector",
"RainDur" = "sum",
"NO" = list("coverage", "mean")
)
resample(airmo_min30, h1_statistics, "h1", data_thresh = 0.8)
#> # A tibble: 4,464 × 6
#> starttime site parameter interval unit value
#> <dttm> <fct> <fct> <fct> <fct> <dbl>
#> 1 2013-01-01 00:00:00 Zch_Stampfenbachstrasse NO_valid%_m… h1 % 100
#> 2 2013-01-01 01:00:00 Zch_Stampfenbachstrasse NO_valid%_m… h1 % 100
#> 3 2013-01-01 02:00:00 Zch_Stampfenbachstrasse NO_valid%_m… h1 % 100
#> 4 2013-01-01 03:00:00 Zch_Stampfenbachstrasse NO_valid%_m… h1 % 100
#> 5 2013-01-01 04:00:00 Zch_Stampfenbachstrasse NO_valid%_m… h1 % 100
#> 6 2013-01-01 05:00:00 Zch_Stampfenbachstrasse NO_valid%_m… h1 % 100
#> 7 2013-01-01 06:00:00 Zch_Stampfenbachstrasse NO_valid%_m… h1 % 100
#> 8 2013-01-01 07:00:00 Zch_Stampfenbachstrasse NO_valid%_m… h1 % 100
#> 9 2013-01-01 08:00:00 Zch_Stampfenbachstrasse NO_valid%_m… h1 % 100
#> 10 2013-01-01 09:00:00 Zch_Stampfenbachstrasse NO_valid%_m… h1 % 100
#> # ℹ 4,454 more rows
# Note: all resulting values should be NA -> gap is to big (480 * min30 = 10 days)
y1_statistics <- list(
"default_statistic" = "drop",
"O3" = list("mean", "perc98", "n", "max", "min")
)
resample(airmo_min30, y1_statistics, "y1", max_gap = 480)
#> # A tibble: 5 × 6
#> starttime site parameter interval unit value
#> <dttm> <fct> <fct> <fct> <fct> <dbl>
#> 1 2013-01-01 00:00:00 Zch_Stampfenbachstrasse O3 y1 µg/m3 NA
#> 2 2013-01-01 00:00:00 Zch_Stampfenbachstrasse O3_98%_min30 y1 µg/m3 NA
#> 3 2013-01-01 00:00:00 Zch_Stampfenbachstrasse O3_nb_min30 y1 1 NA
#> 4 2013-01-01 00:00:00 Zch_Stampfenbachstrasse O3_max_min30 y1 µg/m3 NA
#> 5 2013-01-01 00:00:00 Zch_Stampfenbachstrasse O3_min_min30 y1 µg/m3 NA