-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Iss417 #440
base: main
Are you sure you want to change the base?
Iss417 #440
Changes from 8 commits
a378739
98c0c50
e5c982a
757928d
5de2b7d
d1e5c36
0b4c803
0a58a38
0db2a88
d6b67a1
f699e67
ef80eaa
ad4b566
f431c24
1028be4
6ceac83
2bee287
fcf6878
bc3e183
819b87d
46b4021
d1ada2f
e0366ad
7515a79
6d0c9c0
ad631d1
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -14,7 +14,7 @@ editor_options: | |
chunk_output_type: console | ||
--- | ||
|
||
*2023-2024 Mobility Metrics update* | ||
*2024-2025 Mobility Metrics update* | ||
|
||
SUMMARY-LEVEL VALUES | ||
|
||
|
@@ -46,6 +46,7 @@ repository folder | |
|
||
- htaindex2015_data_counties.csv | ||
- htaindex2019_data_counties.csv | ||
- htaindex2020_data_counties.csv | ||
tinatinc marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
Import all the files (and/or combine into one file) with only the | ||
relevant variables and years | ||
|
@@ -110,6 +111,36 @@ transportation_cost_county_2019 <- transport_county_2019 %>% | |
select(state, county, blkgrps, population, households, t_80ami) | ||
``` | ||
|
||
### 2020 | ||
|
||
```{r} | ||
transport_county_2020 <- read_csv(here::here("06_neighborhoods", | ||
"Transportation", | ||
"data", | ||
"htaindex2020_data_counties.csv")) | ||
|
||
|
||
transport_county_2020 <- transport_county_2020 %>% | ||
select(county, blkgrps, population, households, t_80ami) | ||
``` | ||
|
||
create correct FIPS columns | ||
|
||
```{r} | ||
transport_county_2020 <- transport_county_2020 %>% | ||
mutate( | ||
state = substr(county, start = 2, stop = 3), | ||
county = substr(county, start = 4, stop = 6) | ||
) | ||
``` | ||
|
||
Keep only variables of interest | ||
|
||
```{r} | ||
transportation_cost_county_2020 <- transport_county_2020 %>% | ||
select(state, county, blkgrps, population, households, t_80ami) | ||
``` | ||
|
||
|
||
Compare to our official county file to make sure we have all counties accounted for | ||
|
||
|
@@ -125,9 +156,20 @@ counties_2015 <- counties %>% | |
|
||
counties_2019 <- counties %>% | ||
filter(year == 2019) | ||
|
||
counties_2020 <- counties %>% | ||
filter(year == 2020) | ||
``` | ||
|
||
The 2015 and 2019 files have the same number of observations (3134, down from 3142 due to removing the 8 CT counties). 2020 file has 3,143 for due to the Alaska county split. Checking that's the case below: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In the above code chunk can you add a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done! |
||
|
||
```{r} | ||
unique_to_2020 <- counties_2020 %>% | ||
anti_join(counties_2015, by = c("county_name", "state")) | ||
``` | ||
|
||
All files have same number of observations (3142) so no merging needed to account for missings! | ||
But no data is MISSING, these represent accurate expectations based on each year, so no merging needed to account for missings. | ||
|
||
|
||
## QC Checks | ||
|
||
|
@@ -181,6 +223,7 @@ if (length(missing_indices) > 0) { | |
|
||
1 missing value: Loving County, TX (48301 FIPS). | ||
|
||
|
||
County-Level Transportation Cost 2019 | ||
|
||
```{r} | ||
|
@@ -225,7 +268,54 @@ if (length(missing_indices) > 0) { | |
} | ||
``` | ||
|
||
No missing values for 2019. | ||
No missing values for 2020. | ||
|
||
County-Level Transportation Cost 2020 | ||
|
||
```{r} | ||
ggplot(transportation_cost_county_2020, aes(x=t_80ami)) + geom_histogram(binwidth=10) + labs(y="number of counties", x="Annual Transit Cost for the Regional Moderate Income Household, 2020") | ||
``` | ||
|
||
Look at summary stats | ||
```{r} | ||
summary(transportation_cost_county_2020$t_80ami) | ||
``` | ||
|
||
Examine outliers | ||
```{r} | ||
transportation_cost_county_2020_outliers <- transportation_cost_county_2020 %>% | ||
filter(t_80ami>100) | ||
``` | ||
|
||
No weird outliers | ||
|
||
Use stopifnot to check if all values in "transportation_cost_county_2020" are non-negative | ||
|
||
```{r} | ||
stopifnot(min(transportation_cost_county_2020$t_80ami, na.rm = TRUE) >= 0) | ||
``` | ||
|
||
Good to go. | ||
|
||
Find indices of missing values for the "transit_cost_80ami" variable | ||
|
||
```{r} | ||
missing_indices <- which(is.na(transportation_cost_county_2020$t_80ami)) | ||
``` | ||
|
||
Print observations with missing values | ||
|
||
```{r} | ||
if (length(missing_indices) > 0) { | ||
cat("Observations with missing values for transit_cost_80ami:\n") | ||
print(transportation_cost_county_2020[missing_indices, , drop = FALSE]) | ||
} else { | ||
cat("No missing values for transportation_cost_county_2020\n") | ||
} | ||
``` | ||
|
||
No missing values for 2020. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. When I ran this I am seeing there is one missing value: State 48, County 243 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes! Thank you for catching - 1 missing value for 2020 |
||
|
||
|
||
## Data quality marker | ||
|
||
|
@@ -234,6 +324,7 @@ Determine data quality cutoffs based on number of observations (all at the HH le | |
```{r} | ||
summary(transportation_cost_county_2015$households) | ||
summary(transportation_cost_county_2019$households) | ||
summary(transportation_cost_county_2020$households) | ||
``` | ||
|
||
We use a 30 HH cutoff for Data Quality 3 for the ACS variables, so for the sake of consistency, since none of these are less than 30 (all minimum values are at least 30 HHs), Data Quality can be 1 for all these observations BUT ALSO, rename all the metrics variables to what we had before (transit_trips & transit_cost), so we can name the quality variable appropriately | ||
|
@@ -245,6 +336,9 @@ transportation_cost_county_2015 <- transportation_cost_county_2015 %>% | |
transportation_cost_county_2019 <- transportation_cost_county_2019 %>% | ||
rename(transit_cost = t_80ami) %>% | ||
mutate(transit_cost_quality = 1) | ||
transportation_cost_county_2020 <- transportation_cost_county_2020 %>% | ||
rename(transit_cost = t_80ami) %>% | ||
mutate(transit_cost_quality = 1) | ||
``` | ||
|
||
## Export files | ||
|
@@ -267,12 +361,21 @@ transportation_cost_county_2019 <- transportation_cost_county_2019 %>% | |
) | ||
``` | ||
|
||
Combine the two years into one overall files for both variables | ||
```{r} | ||
transportation_cost_county_2020 <- transportation_cost_county_2020 %>% | ||
mutate( | ||
year = 2020, | ||
transit_cost = transit_cost/100 | ||
) | ||
``` | ||
|
||
Combine the three years into one overall file for both variables | ||
|
||
```{r} | ||
transit_cost_county <- rbind(transportation_cost_county_2015, transportation_cost_county_2019) | ||
transit_cost_county <- rbind(transportation_cost_county_2015, transportation_cost_county_2019, transportation_cost_county_2020) | ||
``` | ||
|
||
Combined file has 9427 observations, which is correct (3142+3142+3143) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Similar to above, I would recommend adding a count argument so the number of observations is printed There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done |
||
Keep variables of interest and order them appropriately also rename to correct var names | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It would be helpful to see the distriubtion of transit costs by county for all three years visualized together to observe similarity or movements. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Added! Added commentary as well -- TLDR, the distributions are comparable, but costs increased from 2015 to 2019, and then decreased a lot in 2020 to below 2015 levels (which tracks, given this was the COVID year) |
||
|
||
```{r} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noting that the "data" folder does not exist inside the Transportation folder, either add the data folder or instruct reviewers to create it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added instruction for users/reviewers to create this folder to use it