Expansion and Weighting
Introduction
This section summarizes data weighting for Travel Behavior Inventory datasets.
The Travel Behavior Inventory is tasked with recording a representative snapshot of regional travel behavior. Two mechanisms help the TBI team accomplish this task: the first is randomized sampling of addresses in the region, and the second is weighting to account for non-response bias.
Non-response bias occurs when certain demographic groups (for example, young people or large households) are harder to survey than others. RSG corrects for a portion of non-response bias in the sampling phase, by increasing the number of randomly selected addresses in geographic areas where hard-to-recruit groups tend to live, but some bias in the final survey sample will always persist.
Weighting corrects for non-response bias by expanding the representation of the survey from only the sample itself to the population from which it is sampled along multiple key dimensions (sourced from Census data). RSG has historically prioritized adjustments for representation across the following dimensions:
- Household size, income, vehicles, workers, and presence of children
- Person sex, age, worker status, typical work mode, education, race, ethnicity
- Aggregated geography (e.g., PUMA or groups of PUMAs)
Near the conclusion of the six-year project, RSG decided to re-weight both 2019 and 2021 datasets while weighting the 2023 dataset to uniformly apply weighting methodologies across all three waves of data collection (See Re-Weighting Memo). As a result, all weighting procedures were nearly identical across all three waves, with the exceptions that the study segmentation changed from 2019 to 2021. Namely, the core urban
segment was sub-divided into five sub-segments, while the hard-to-reach
segment was absorbed into other segments (urban, rural, rural ring). For details on the sample segmentation, see the Survey Sampling section of this report.
The weighting process generates four types of weights:
- A household-level weight. The sum of these weights reflects the total households in the survey region.
- A person-level weight. The sum of these weights reflects the total persons in the survey region.
- A day-level weight. The sum of these weights also reflects the total persons in the survey region (and matches the sum of the person-level weights). The person weights are spread evenly across the number of complete weekdays, so the table represents the sum of one average weekday for each person in the study.
- A trip-level weight. The sum of these weights represents the total number of trips all persons residing in the region make on a typical weekday (i.e., Tuesday, Wednesday, and Thursday). This weight should be used for trip-level analyses.
Note: this differs from the number of trips made in the survey region on a typical day, given that some residents make trips outside the region.
Weighting Process
The survey weighting process includes five primary steps:
- Initial expansion: Calculating an
initial weight
based on the probability of selection in the sample design. This step essentiallyreverses
the sample plan, providing higher initial weights to areas where less sampling occurs. Each household is assigned an initial weight based on its probability of being invited to the survey. For example, if 5 households respond in a geography with 100 total households, each of the 5 households will receive an initial weight of 20 (100 / 5 = 20).
- Target-optimized weighting: After the initial expansion, household weights are adjusted to simultaneously fit selected household- and person-level targets. For example, if 20% of households in the state are one-person households, but 25% of the sample households are one-person households, RSG adjusts the weights to better match the household size distribution of the population. This process leverages an open-source application, PopulationSim, which optimizes the household weights against all target control variables simultaneously (e.g., household size, vehicles, age, race, ethnicity, gender, employment, education). This step is performed twice in the weighting routine: once after the initial expansion (step 1) and again using additional targets estimated from the day-pattern (step 3).
- Re-weighting with day-pattern adjustment to account for multi-day survey data: Some survey respondents will provide data for one travel day while others will provide data for two or more days. To ensure multi-day respondents are not over-represented in the final data, RSG estimates a simple day-pattern model using the initially weighted data, normalized by the number of complete weekdays per person. The normalized model then estimates the frequency of each daily-activity type (mandatory trips, non-mandatory trips, no trips) to account for the multi-day bias. The aggregated weighted frequencies are then included as additional control target variables and the data are re-weighted in PopulationSim a second time to ensure the households are weighted to account for this multi-day bias so that households with more days are not equal weight with households with just one day.
- Calculating person, day, and trip weights: The household weight is assigned to subsequent person, day, and trip weights.
- Adjusting for non-response bias in day-pattern and trip rates: During this final step, RSG adjusts the trip-level weights to account for survey biases based on the method respondents use to report their travel. For example, if respondents who report their travel over the phone report fewer non-home-based-work trips compared to respondents who report their travel in the smartphone app (after correcting for differences in demographics), the day- and trip-level weights for respondents who report their travel over the phone are adjusted to align with the smartphone app respondents more closely. Travel reported by respondents who use the smartphone app is considered more accurate because respondents are not required to recall their travel and are therefore less likely to under-report trips.
Weighting Updates
Over the course of the six-year TBI program contract between the Council and RSG, the survey team made a number of substantial improvements to weighting. We expect weighting to be under continuous development, both as new challenges arise (e.g., incorporating new types of samples, like panel samples) and as new methods are developed to correct for bias (e.g., developing new statistical models to account for survey bias).
To ensure compatibility of weighted data over time, the survey team has re-weighted data when new methods are developed. These re-weighting instances are listed below.
2021: Transition from Iterative Proportional Fitting (IPF) algorithm to PopulationSim Entropy Maximization (EM) algorithm. RSG implemented an improved method that matches sample (survey) data to target (Census) data during the second wave of the Travel Behavior Inventory. This method resulted in more reliable weighting results and was less prone to the influence of small sample size or outlier data. To bring Wave 1 (2019) data into line with the new methods, RSG re-weighted Wave 2.
2024: Use of a transit trip target. After examining weighted data for all three waves of the survey (2019 – 2021 – 2023), the team decided to incorporate a new type of target in weighting: transit boardings. Using published data on boardings from the NTD and Metro Transit, RSG developed a method to tie the final, household-level transit trip rates to a known number. This allowed the team to correct for a bias towards transit users in the sample, which was not sufficiently addressed with Census demographic targets.
2024: Improvements to handling of non-related persons. In 2021 and 2023 (the second and third waves of the survey program), the survey team stopped requiring full travel diaries from roommates and household help. This required careful use of Census target data fed to weighting, because the Census defines a household as all those living under one roof. In 2021, RSG used a method whereby raw Census PUMS data was manipulated to account for the missing
travel diaries of those non-related householders. In 2024, RSG moved away from this method, using a correction during final weight-setting instead. This preserved the integrity of PUMS data and increased the transparency of methods. Along the way, RSG also updated its methods for imputing household income to include the unreported
household income of those non-related householders.
Using the Weights
For a detailed explanation of how to use the weights with examples in R, consult the Data User’s Guide (R) in the Appendix.
Analyses designed to draw conclusions about travel behavior in the region (as opposed to just the survey respondents) should use weighted data.
When applied, the weights make the dataset representative of travel for residents within the study region for the time period studied.
Just a reminder that the dataset does not include:
- Commercial vehicle travel
- Travel for persons residing in group quarters outside of the address-based sample frame (e.g., college dorms, institutional housing)
- Travel from non-residents (i.e., visitors to the region)
- Seasonal/holiday travel outside of the survey fielding period.
Data users should also keep in mind the following when creating weighted statistics and summaries from HTS data:
- Filter to the data relevant to your analysis. Note that not all people are asked every question, so understanding the
missing value
codes andsurvey logic
in the data dictionary are important.
- Remember the survey design when using and interpreting weighted values. For example, the Travel Behavior Inventory included both one-day online and call center participation and seven-day smartphone participation. Therefore, it is best to avoid filtering by day of week since not all participants traveled on all days.
Calculating Weighted Summaries
In general, household weights should be used for household- and vehicle-level analyses; person weights for person-level analyses, day weights for day-level analyses, and trip weights for trip-level analyses. To calculate weighted summaries or descriptive statistics, sum the weights for that table.
In the case of an analysis that requires variables from two or more tables, the weight from the lowest level of the data hierarchy should be used (trip weights, then day weights, person weights, and finally household weights). For example, a weighted summary of race (a person-level variable) by household income should use person-level weights.
Calculating Trip Rates
Special considerations apply when generating weighted trip rates.
To calculate a weighted trip rate –the number of trips per day–data users must divide the number of weighted trips by the number of weighted travel days.
For example:
- If there are 300,000 weighted person-trips across 75,000 person-days, then the average person-trip rate is 4.0 per day.
- If there are 225,000 person-trips by car across 75,000 person-days, then the person-trip rate for car trips is 3.0.
Note: Data users should always calculate the number of weighted travel days using the day table rather than the trip table given that persons with zero-trip travel days do not have any records in the trip tables for those days.
Variance Estimation
There are two common approaches for variance estimation using weights. The first, replicate weights, requires the data user to leverages multiple sets of weights for the same set of observations. Analyses using each set of weights are averaged
to create variance estimates. This procedure has major drawbacks, because the replicates increase dataset complexity and add require complex weighting procedures.
The preferred approach is an approximation using Taylor-series linearization. This procedure approximates the variance using a simpler to implement formula, simplifying the dataset and weight generation. It is easily implemented in the R library {survey} or the {samplicspackage} in Python, both of which are well-sourced packages with deep use among survey researchers.
Throughout this report, standard error bars are estimated using the Taylor-series linearization approach in the R package {survey}.
Combining weights
To align with the ACS multi-year weighting approach[1], RSG has delivered single-year weights and recommends calculating multi-year weights by taking a simple average of weights. For example, to combine the 2019, 2021 and 2023 data into a single sample, the weights would be divided by three, resulting in a sum of weights (e.g., for households) that is an average of the population between 2019 and 2023. Note, however, that the effects of the COVID-19 pandemic on travel in 2021 could make such a combined dataset less useful in some regards.
[1] American Community Survey 2010, page 11-16: https://www.census.gov/content/dam/Census/library/publications/2010/acs/Chapter_11_RevisedDec2010.pdf
