Iss454 #457

awunderground · 2025-02-04T18:39:11Z

This PR is for Update transportation trips metric #454 but only successfully updates the scripts for counties.
A description of the content in this pull request.

Changes:
- Consolidates years and metrics (transit ridership and transportation cost) so each calculation happens 1/4th as frequently as with the past data.
- Adds clearer documentation and diagnostics.
- Switches the race data to race-ethnicity data. Note: the footnote in the UMI dashboard is incorrect.
- Switches the data quality flag to be based on unweighted sample size instead of the weighted number of households.
- Combines output into 4 files instead of 8 files.
- Removes percentile ranking for transit trips.
- Updates the metric name to be more accurate.
Please focus on the county-level data. Someone else will need to fix the joins in the place data.

Detail on any issues or flags that the metric reviewer/data-team should be aware of.

I think the data quality for the transit trips is poor based on how much the values change between 2015 and 2019.

…transit trips.

…les for transit trips.

Deckart2

Hi @awunderground,

Thank you very much for the hard and quick work to get those code processed. I left some comments below that I think could be helpful to resolve, but they will not lead to changes in the data for county scale.

Per your guidance when requesting the PR, I did not in detail review the city data, though I left two very quick comments. Given that I haven't reviewed that data, I don't want to merge the city-scale code in. Perhaps, we keep this branch open, and when JP makes his changes (also to this branch) we can re-review that code and then merge it all in.

Also just flagging here I have not reviewed code for the 2022 data because it is yet to be created, but I can be available for that too.

06_neighborhoods/Transportation/transportation_county.qmd

Deckart2 · 2025-02-04T22:10:24Z

06_neighborhoods/Transportation/transportation_county.qmd

+full_join(transportation_county, counties, by = c("year", "state", "county"))
+
+anti_join(transportation_county, counties, by = c("year", "state", "county"))
+


I find the commentary helpful but stopifnot() statements even more convincing. Maybe

fully_joined <- full_join(transportation_county, counties, by = c("year", "state", "county")) stopifnot(nrow(fully_joined) == nrow(counties))

Great suggestion. I added this.

Deckart2 · 2025-02-04T22:14:45Z

06_neighborhoods/Transportation/transportation_county.qmd

+  geom_histogram(binwidth = 5) + 
+  facet_wrap(~ year, nrow = 2) +
+  labs(
+    x = "Annual Transit Trips for the Regional Moderate Income Household", 


I had been using "moderate income" because that was H+T language too, but Claudia suggested we use "low income" in the blog post.

06_neighborhoods/Transportation/transportation_county.qmd

Deckart2 · 2025-02-04T22:22:29Z

06_neighborhoods/Transportation/transportation_county.qmd

+
+```
+
+Makes sense for most counties to fall in really low transit trip numbers


Not a legitimate comment / analysis for transportation cost data. The key insight I have is that the distribution looks relatively similar between 2015 and 2019

It would also be helpful to note that the x axis is a share of total household costs spent on transportation.

Deckart2 · 2025-02-04T22:26:05Z

06_neighborhoods/Transportation/transportation_county.qmd

+transportation_county <- transportation_county |>
+  rename(
+    count_transit_trips = transit_trips_80ami,
+    index_transportation_cost = t_80ami


This may go above my level of control, but I don't think index_transportation_cost is a good variable name because I don't think it is really an index. Rather, it seems like it is a share (i.e., the share of annual household income spent on transportation).

I would throw out share_hh_transportation_spending

Deckart2 · 2025-02-05T14:35:02Z

06_neighborhoods/Transportation/transportation_city.qmd

+
+### Read data
+
+The data from HUD cannot be easily read directly into this program.


I think this comment is confusing. We don't get data directly or indirectly from HUD

06_neighborhoods/Transportation/transportation_city.qmd

Deckart2 · 2025-02-05T14:40:25Z

06_neighborhoods/Transportation/transportation_city.qmd

+
+```{r}
+transportation_tracts <- transportation_tracts |>
+  rename (GEOID = tract) |>


extra space here

Deckart2 · 2025-02-05T15:26:45Z

06_neighborhoods/Transportation/transportation_county_subgroups.qmd

+acs_tracts <- acs_tracts |>  
+  rename(
+    total_population = B03002_001E,
+    non_hispanic = B03002_002E,


I don't think you actually use this but that is not a problem

non_hispanic? I think you are correct. I just pulled that because I was running some checks about the overlapping/non-overlapping hierarchy in the categories in the data.

…verty into iss454

jwalsh28

The new method is looking good Aaron!

Two big requests:

Move the final evaluation form to the correct folder (link in comments) and update the file path in the evaluation form
Consider how change in the trip count variable between years can be baked into the quality variable. I lay out a potential method in the comments. The large variations between years concern me.

jwalsh28 · 2025-03-04T17:21:20Z

06_neighborhoods/Transportation/final/final_data_evaluation_form.csv

@@ -0,0 +1,9 @@
+,This form to be filled in for the data in the subgroup files. If the metric has multiple variables please include input for each variable in the file.,,,,,


We updated the destination for these files to https://github.com/UI-Research/mobility-from-poverty/tree/version2025/10a_final-evaluation. Can you move this file there and update the path in your evaluation functions?

06_neighborhoods/Transportation/transportation_county.qmd

jwalsh28 · 2025-03-04T17:31:35Z

06_neighborhoods/Transportation/transportation_county.qmd

+relevant variables and years
+
+```{r}
+transport_county_2015 <- read_csv(


Not a priority for this round but note for future update that we should make this read_csv a function and have the list of years be the only item that is updated each round.

Good note. Added!

jwalsh28 · 2025-03-04T17:40:10Z

06_neighborhoods/Transportation/transportation_county.qmd

+
+```
+
+We transform transportation cost from an unlabeled percentage to a proportion. 


Perhaps a note on what t_80ami is? Could help users better understand the transofrmation

jwalsh28 · 2025-03-04T17:40:59Z

06_neighborhoods/Transportation/transportation_county.qmd

+
+```
+
+The transit trips index is very noisy and tough to interpret. We topcode values at 1,095 (365 *3), divide the range into 100 bins, and assign values to those bins (using a linear transformation and rounding). Percentile ranking did not work well because there were many ties and it obfuscated the distribution of the variable.  


A visual of the raw distribution would be helpful to have

Can you add a bit of explenation on how the topcode value was chosen?

Added and added!

jwalsh28 · 2025-03-04T17:48:51Z

06_neighborhoods/Transportation/transportation_county.qmd

+
+```{r}
+transportation_county <- transportation_county |>
+  mutate(count_transit_trips = pmin(1065, count_transit_trips)) |>


Above you say you are topcoding values at 1,095 (which is the product of 365 and 3 per your note) but this code topcodes at 1,065.

Whoops! Fixed!

jwalsh28 · 2025-03-04T17:50:09Z

06_neighborhoods/Transportation/transportation_county.qmd

+```{r}
+transportation_county <- transportation_county |>
+  mutate(count_transit_trips = pmin(1065, count_transit_trips)) |>
+  mutate(score_transit_trips = round((count_transit_trips / 1065) * 100))


Same here (1,065 instead of 1,095)

Whoops! Fixed!

jwalsh28 · 2025-03-04T18:02:05Z

06_neighborhoods/Transportation/transportation_county.qmd

+  geom_point(alpha = 0.1) +
+  facet_wrap(~ size) +
+  coord_equal() +
+  labs(subtitle = "Large counties have at least 200,000 households")


Great viz, I am not sure what to make of these large counties that show such drastic changes in transit trips between 2015 and 2022. Looking at one of the outlier cases, the data for DC shows 22 trips for transit_trips_80ami in 2015, 1150 in 2019 and then back to 301 in 2022.

jwalsh28 · 2025-03-04T18:35:37Z

06_neighborhoods/Transportation/transportation_county.qmd

+Run the evaluation function. 
+
+```{r}
+evaluate_final_data(


See note from above, update file path after moving the evaluation form to the 10b folder.

jwalsh28 · 2025-03-04T18:46:35Z

06_neighborhoods/Transportation/transportation_county.qmd

+
+```
+
+## Data Quality Marker


I worry that we do not consider extreme variations in the count transit trips reported for certain counties in the quality varaible. A loose concept I have for baking this into quality is as follows:

Take the product of count_transit_trips and households variables to create a new variable estimated_total_trips

Calculate the delta in this variable from the year prior (for the oldest year it will be the year following)

If the change exceeds X percent we should give a quality of 3 for that year (not sure how to select X criteria)

I want to include population because smaller counties may be more prone to large shifts in the average but this should be smaller for estimated total trips. Happy to brainstorm this further with you.

Thank you for your thoughtfulness. I share all of your concerns. This is a really interesting idea.

The values can change a lot for natural reasons. For example, there is a big drop everywhere between 2019 and 2022 because of changes in commuting patterns.

"Calculate the delta in this variable from the year prior (for the oldest year it will be the year following)" -- Can you unpack this? What do I do for 2015?

…feedback.

awunderground added 5 commits February 1, 2025 13:45

Combine trips and costs at the county level. Remove percentiles from …

4214b21

…transit trips.

Combine transportation cost and transit trips for city. Drop percenti…

9b96112

…les for transit trips.

Rewrite subgroups code for counties, combine output into one file.

3ed8d45

Combine and rename files. Add evaluation check for 2/4 scripts.

bc8563d

Set up county files for review. Place files are a mess.

65a08fe

awunderground requested a review from Deckart2 February 4, 2025 18:39

cdsolari assigned Deckart2 Feb 5, 2025

Deckart2 reviewed Feb 5, 2025

View reviewed changes

awunderground added 2 commits February 26, 2025 11:44

Merge branch 'version2025' of github.com:UI-Research/mobility-from-po…

e5fba14

…verty into iss454

Use new tracts-places crosswalk for transportation metrics.

66a9a99

awunderground requested review from jwalsh28 and Deckart2 and removed request for Deckart2 March 4, 2025 15:38

Transform and finalize county dat afor reviews

cdb4d43

jwalsh28 requested changes Mar 4, 2025

View reviewed changes

Clean up documentation, update city files, and respond to JP's early …

cfc3c91

…feedback.

awunderground requested a review from jwalsh28 March 5, 2025 15:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Iss454 #457

Iss454 #457

awunderground commented Feb 4, 2025

Deckart2 left a comment

Deckart2 Feb 4, 2025

awunderground Feb 26, 2025

Deckart2 Feb 4, 2025

Deckart2 Feb 4, 2025

Deckart2 Feb 4, 2025

Deckart2 Feb 5, 2025

awunderground Feb 26, 2025

Deckart2 Feb 5, 2025

awunderground Feb 26, 2025

Deckart2 Feb 5, 2025

awunderground Feb 26, 2025

jwalsh28 left a comment

jwalsh28 Mar 4, 2025

awunderground Mar 5, 2025

jwalsh28 Mar 4, 2025

awunderground Mar 5, 2025

jwalsh28 Mar 4, 2025

awunderground Mar 5, 2025

jwalsh28 Mar 4, 2025

jwalsh28 Mar 4, 2025

awunderground Mar 5, 2025

jwalsh28 Mar 4, 2025

awunderground Mar 5, 2025

jwalsh28 Mar 4, 2025

awunderground Mar 5, 2025

jwalsh28 Mar 4, 2025

jwalsh28 Mar 4, 2025

awunderground Mar 5, 2025

jwalsh28 Mar 4, 2025

awunderground Mar 5, 2025

		full_join(transportation_county, counties, by = c("year", "state", "county"))

		anti_join(transportation_county, counties, by = c("year", "state", "county"))


		```

		Makes sense for most counties to fall in really low transit trip numbers


		### Read data

		The data from HUD cannot be easily read directly into this program.

		@@ -0,0 +1,9 @@
		,This form to be filled in for the data in the subgroup files. If the metric has multiple variables please include input for each variable in the file.,,,,,


		```

		We transform transportation cost from an unlabeled percentage to a proportion.


		```

		The transit trips index is very noisy and tough to interpret. We topcode values at 1,095 (365 *3), divide the range into 100 bins, and assign values to those bins (using a linear transformation and rounding). Percentile ranking did not work well because there were many ties and it obfuscated the distribution of the variable.

Iss454 #457

Are you sure you want to change the base?

Iss454 #457

Conversation

awunderground commented Feb 4, 2025

Deckart2 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jwalsh28 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment