Skip to contents

Aggregate data by different time periods. Following this simple steps:

  • split data in series

    • pad data serie (needed for calculation of capture threshold, detection of gaps)

    • group serie by new interval with lubridate::floor_date()

    • apply statistical method or user provides function (user can provide list per parameter)

  • combine resampled series

It is possible to supply different methods for different parameters. The argument statistic can be named list. The name stands for the parameter. The value can be a function to apply, a name of method or a list of names. Some methods renames the parameter and changes the unit. A list of method names can only contain one non renaming method.

Usage

resample(
  data,
  statistic = "mean",
  new_interval,
  data_thresh = NULL,
  max_gap = NULL,
  rename_parameter = TRUE,
  percentile = 0.95,
  skip_padding = FALSE,
  start_date = NULL,
  end_date = NULL,
  drop_last = FALSE
)

Arguments

data

A tibble in rOstluft long format

statistic

Statistical method(s) to apply when aggregating the data. Can be a simple string with name of the method or a function with one argument. Or a list with parameter as name and the statistical method as value (function or name of method). Or a list with parameter as and a list of statisticals methods. All methods must support renaming parameter. A default statistic for all parameters not in the list, can be defined with the name "default_statistic". See section Statistical methods and examples

new_interval

New interval. Must be longer than actual interval (not checked)

data_thresh

optional minimum data capture threshold in to use

max_gap

optional maxium Number of consecutive NA values

rename_parameter

optional rename parameter

percentile

The percentile level used when statistic = "percentile". The default is 0.95

skip_padding

don't pad the data before applying statistics. Default FALSE

start_date

optional start date for padding. Default min date in series floored to the new interval

end_date

optional end date for padding. Default max date in series ceiled to the new interval

drop_last

optional drop the last added time point by padding. Default False, true if no end_date provided and max date != ceiled max date.

Value

tibble with resampled data

Statistical methods

The statistical method is a function with a numeric vector as argument and returns a single value.

  • "mean" average value

  • "median" median value

  • "sd" standard deviation of values

  • "sum" sum over all values

  • "max" maxium value

  • "min" minimum value

  • "n" number of valid records, renames parameter, changes unit

  • "coverage" percentage of valid records, renames parameter, changes unit

  • "percentile" calculates the percentile. Use the argument percentile to specify the level, renames parameter

  • "perc95" 95% percentile, renames parameter

  • "perc98" 98% percentile, renames parameter

  • "n>5" number of values > 5 (WHO PM2.5 y1 limit), renames parameter, changes unit

  • "n>8" number of values > 8 (CO d1 limit), renames parameter, changes unit

  • "n>10" number of values > 10 (PM2.5 y1 limit), renames parameter, changes unit

  • "n>15" number of values > 15 (WHO PM10 limit), renames parameter, changes unit

  • "n>25" number of values > 25 (WHO NO2 d1 limit), renames parameter, changes unit

  • "n>30" number of values > 30 (NO2, SO2 y1 limit), renames parameter, changes unit

  • "n>40" number of values > 40 (WHO SO2 d1 limit), renames parameter, changes unit

  • "n>45" number of values > 45 (WHO PM10 d1 limit), renames parameter, changes unit

  • "n>50" number of values > 50 (PM10 d1 limit), renames parameter, changes unit

  • "n>60" number of values > 60 (y1 limit), renames parameter, changes unit

  • "n>65" number of values > 65 (O3 d1 indicator), renames parameter, changes unit

  • "n>80" number of values > 80 (NO2 d1 limit), renames parameter, changes unit

  • "n>100" number of values > 100 (SO2 d1 limit), renames parameter, changes unit

  • "n>120" number of values > 120 (O3 h1 limit), renames parameter, changes unit

  • "n>160" number of values > 160 (O3 h1 indicator), renames parameter, changes unit

  • "n>180" number of values > 180 (O3 h1 indicator), renames parameter, changes unit

  • "n>200" number of values > 200 (O3 h1 indicator), renames parameter, changes unit

  • "n>240" number of values > 240 (O3 h1 indicator), renames parameter, changes unit

  • "drop" drops the parameter from the result, useful for persons too lazy to filter the input data

Wind

Wind is a special case. For vector averaging the methods needs two inputs (direction and speed). To resample wind data it is necessary to specify three parameters with the methods "wind.direction", "wind.speed_vector" and "wind.speed_scalar". Even if scalar or vector speed isn't present. The parameter will be substituted by the other.

Important: Wind calculation are standalone. It is possible to calculate multiple methods for non wind parameters.

TODO

  • AOT40 statistic?

  • some from https://github.com/davidcarslaw/openair/blob/master/R/aqStats.R?

Examples

min30 <- system.file("extdata", "Zch_Stampfenbachstrasse_min30_2013_Jan.csv",
                     package = "rOstluft.data", mustWork = TRUE)

airmo_min30 <- read_airmo_csv(min30)

# filter volume concenctrations, only use mass concentrations
airmo_min30 <- dplyr::filter(airmo_min30, !(.data$unit == "ppb" | .data$unit == "ppm"))

d1_statistics <- list(
  "default_statistic" = "drop",
  "Hr" = "mean",
  "RainDur" = "sum",
  "O3" = list("mean", "max", "min", "n")
)
resample(airmo_min30, d1_statistics, "d1", data_thresh = 0.8)
#> # A tibble: 186 × 6
#>    starttime           site                    parameter interval unit  value
#>    <dttm>              <fct>                   <fct>     <fct>    <fct> <dbl>
#>  1 2013-01-01 00:00:00 Zch_Stampfenbachstrasse Hr        d1       %Hr    76.0
#>  2 2013-01-02 00:00:00 Zch_Stampfenbachstrasse Hr        d1       %Hr    83.5
#>  3 2013-01-03 00:00:00 Zch_Stampfenbachstrasse Hr        d1       %Hr    78.0
#>  4 2013-01-04 00:00:00 Zch_Stampfenbachstrasse Hr        d1       %Hr    83.5
#>  5 2013-01-05 00:00:00 Zch_Stampfenbachstrasse Hr        d1       %Hr    83.8
#>  6 2013-01-06 00:00:00 Zch_Stampfenbachstrasse Hr        d1       %Hr    82.3
#>  7 2013-01-07 00:00:00 Zch_Stampfenbachstrasse Hr        d1       %Hr    84.9
#>  8 2013-01-08 00:00:00 Zch_Stampfenbachstrasse Hr        d1       %Hr    79.0
#>  9 2013-01-09 00:00:00 Zch_Stampfenbachstrasse Hr        d1       %Hr    89.4
#> 10 2013-01-10 00:00:00 Zch_Stampfenbachstrasse Hr        d1       %Hr    89.6
#> # ℹ 176 more rows

# Note: wind parameters don't support multiple methods via list!
h1_statistics <- list(
  "default_statistic" = "drop",
  "WD" = "wind.direction",
  "WVs" = "wind.speed_scalar",
  "WVv" = "wind.speed_vector",
  "RainDur" = "sum",
  "NO" = list("coverage", "mean")
)
resample(airmo_min30, h1_statistics, "h1", data_thresh = 0.8)
#> # A tibble: 4,464 × 6
#>    starttime           site                    parameter    interval unit  value
#>    <dttm>              <fct>                   <fct>        <fct>    <fct> <dbl>
#>  1 2013-01-01 00:00:00 Zch_Stampfenbachstrasse NO_valid%_m… h1       %       100
#>  2 2013-01-01 01:00:00 Zch_Stampfenbachstrasse NO_valid%_m… h1       %       100
#>  3 2013-01-01 02:00:00 Zch_Stampfenbachstrasse NO_valid%_m… h1       %       100
#>  4 2013-01-01 03:00:00 Zch_Stampfenbachstrasse NO_valid%_m… h1       %       100
#>  5 2013-01-01 04:00:00 Zch_Stampfenbachstrasse NO_valid%_m… h1       %       100
#>  6 2013-01-01 05:00:00 Zch_Stampfenbachstrasse NO_valid%_m… h1       %       100
#>  7 2013-01-01 06:00:00 Zch_Stampfenbachstrasse NO_valid%_m… h1       %       100
#>  8 2013-01-01 07:00:00 Zch_Stampfenbachstrasse NO_valid%_m… h1       %       100
#>  9 2013-01-01 08:00:00 Zch_Stampfenbachstrasse NO_valid%_m… h1       %       100
#> 10 2013-01-01 09:00:00 Zch_Stampfenbachstrasse NO_valid%_m… h1       %       100
#> # ℹ 4,454 more rows

# Note: all resulting values should be NA -> gap is to big (480 * min30 = 10 days)
y1_statistics <- list(
  "default_statistic" = "drop",
  "O3" = list("mean", "perc98", "n", "max", "min")
)
resample(airmo_min30, y1_statistics, "y1", max_gap = 480)
#> # A tibble: 5 × 6
#>   starttime           site                    parameter    interval unit  value
#>   <dttm>              <fct>                   <fct>        <fct>    <fct> <dbl>
#> 1 2013-01-01 00:00:00 Zch_Stampfenbachstrasse O3           y1       µg/m3    NA
#> 2 2013-01-01 00:00:00 Zch_Stampfenbachstrasse O3_98%_min30 y1       µg/m3    NA
#> 3 2013-01-01 00:00:00 Zch_Stampfenbachstrasse O3_nb_min30  y1       1        NA
#> 4 2013-01-01 00:00:00 Zch_Stampfenbachstrasse O3_max_min30 y1       µg/m3    NA
#> 5 2013-01-01 00:00:00 Zch_Stampfenbachstrasse O3_min_min30 y1       µg/m3    NA