Aggregate raw sensor data to a chosen level

aggregate_sensor(
  sensor_data,
  config,
  interval_length,
  replace_impossible = TRUE,
  interpolate_missing = FALSE,
  occupancy_pct_threshold = 0.002
)

Arguments

sensor_data: data frame for single sensor returned from pull_sensor()
config: data.table, a configuration file for the given sensor
interval_length: numeric, the interval length in hours. NA indicates no aggregation (30 second data) 0.25 indicates 15 minutes. Default is 1.
replace_impossible: logical, whether to replace impossible values with NA. Default is TRUE and highly recommended.
interpolate_missing: logical, whether to interpolate missing volume and occupancy values at the raw data level. Only applies if replace_impossible is TRUE. Note that this option increases the function runtime.
occupancy_pct_threshold: numeric, the lowest possible occupancy percentage to use when calculating speed. Default is 0.0020 or 0.02%. Increasing the threshold results in more stable speed values, while lowering it may increase speed variability. A higher occupancy threshold is recommended for shorter interval lengths

Value

a data.table with values for volume, occupancy, and speed

date IDate, the given date
interval_bin numeric, the observations interval bin
{measure}.pct_null numeric, the percentage of observations with null values for the given measure
{measure}.sum numeric, the measure's total over the given interval
{measure}.mean numeric, the measure's mean over the given interval
speed numeric, the mean traffic speed over the given interval

Details

Calculating speed

There are 60 scans per second, which means there are 60 * 60 = 1,800 scans per
30-second interval. The occupancy value in the 30-second interval data
represents the number of scans that were occupied of the 1,800 scans in that
interval.

With 60 scans per second, 60 seconds per minute there are 3,600 scans per minute.
With 3,600 scans per minute, 60 minutes per hour there are 216,000 scans per hour.
To find the number of scans in 15 minutes, we can multiply 0.25 * 216000 = 54,000 scans.

Speed, in miles per hour, is calculate by multiplying the number of
vehicles per hour by field length in miles, divided by the
occupancy for the given interval.

Impossible values

Any observation with a volume that exceeds 20 vehicles or an occupancy that exceeds 1,800 scans
will be replaced with `NA`. It is impossible for more than twenty vehicles to pass over a sensor
in only 30 seconds, and the maximum number of scans in 30 seconds is 1,800
(60 scans/second * 30 seconds).

Interpolating missing values

`interpolate_missing` indicates whether to interpolate missing volume and occupancy values
at the raw data level. The interpolated value for a given observation is the mean of
the two observations on either side of the observation. This method preserves the variable's
overall distribution.

Examples

if (FALSE) {

library(tc.sensors)
library(dplyr)
config <- pull_configuration()

config_sample <- dplyr::filter(config, config$detector_abandoned == "f") %>%
  dplyr::sample_n(1)
yesterday <- as.Date(Sys.Date() - 365)

sensor_results <- pull_sensor(
  sensor = config_sample$detector_name[[1]],
  pull_date = yesterday
)

aggregate_sensor(sensor_results,
  interval_length = 1,
  config = config_sample
)
}