Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Iss417 #440

Open
wants to merge 26 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 8 commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
a378739
Update README.md
cdsolari Oct 17, 2024
98c0c50
Resolve merge conflict by accepting ReadMe suggestions
jwalsh28 Oct 29, 2024
e5c982a
transit cost - county, nonsubgroup update
tinatinc Dec 30, 2024
757928d
Transit Trips County + City nonsubgroup update
tinatinc Dec 30, 2024
5de2b7d
Transit trips county nonsubgroup update
tinatinc Dec 30, 2024
d1e5c36
changing the year of the update at the top of the code files
tinatinc Dec 30, 2024
0b4c803
County - Transit Trips Subgroups + Transit Cost Subgroups update
tinatinc Dec 30, 2024
0a58a38
City - Transit Cost & Trips Subgroups update adding 2020
tinatinc Dec 30, 2024
0db2a88
Add folder for final forms
awunderground Jan 2, 2025
d6b67a1
Merge pull request #442 from UI-Research/forms_folder
ridhi96 Jan 2, 2025
f699e67
homeless and ela county files
ekgutierrez1 Jan 6, 2025
ef80eaa
updating place-populations crosswalk to add 2014 PEP data
tinatinc Jan 8, 2025
ad4b566
Update create-place-populations.qmd
tinatinc Jan 8, 2025
f431c24
Adding 2014 PEP population data and re-adding the 8 CT counties throu…
tinatinc Jan 8, 2025
1028be4
Updates to the README
tinatinc Jan 8, 2025
6ceac83
Merge pull request #443 from UI-Research/Iss425
tinatinc Jan 8, 2025
2bee287
Merge branch 'version2025' of https://github.com/UI-Research/mobility…
tinatinc Jan 10, 2025
fcf6878
transit_cost_county code and output update
tinatinc Jan 14, 2025
bc3e183
transit_trips and transit_cost code and files updated for CITY
tinatinc Jan 14, 2025
819b87d
transit_trips_county code and output files updated
tinatinc Jan 14, 2025
46b4021
transit_trips_cost_county_subgroups code and files update
tinatinc Jan 14, 2025
d1ada2f
transit_cost_all_subgroups_city code and files update
tinatinc Jan 14, 2025
e0366ad
Evaluation form for transit metrics added
tinatinc Jan 14, 2025
7515a79
removed final evaluation form due to confusion
tinatinc Jan 14, 2025
6d0c9c0
evaluation forms
tinatinc Jan 14, 2025
ad631d1
Adding final evaluation to 2 of the code files (all, county)
tinatinc Jan 14, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
486 changes: 486 additions & 0 deletions 06_neighborhoods/Transportation/final/transit_cost_all_city.csv

Large diffs are not rendered by default.

3,143 changes: 3,143 additions & 0 deletions 06_neighborhoods/Transportation/final/transit_cost_all_county.csv

Large diffs are not rendered by default.

1,944 changes: 1,944 additions & 0 deletions 06_neighborhoods/Transportation/final/transit_cost_all_subgroups_city.csv

Large diffs are not rendered by default.

12,576 changes: 12,576 additions & 0 deletions 06_neighborhoods/Transportation/final/transit_cost_all_subgroups_county.csv

Large diffs are not rendered by default.

486 changes: 486 additions & 0 deletions 06_neighborhoods/Transportation/final/transit_trips_all_city.csv

Large diffs are not rendered by default.

9,427 changes: 6,285 additions & 3,142 deletions 06_neighborhoods/Transportation/final/transit_trips_all_county.csv

Large diffs are not rendered by default.

1,944 changes: 1,944 additions & 0 deletions 06_neighborhoods/Transportation/final/transit_trips_all_subgroups_city.csv

Large diffs are not rendered by default.

12,572 changes: 12,572 additions & 0 deletions 06_neighborhoods/Transportation/final/transit_trips_all_subgroups_county.csv

Large diffs are not rendered by default.

113 changes: 108 additions & 5 deletions 06_neighborhoods/Transportation/transit_cost_county.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ editor_options:
chunk_output_type: console
---

*2023-2024 Mobility Metrics update*
*2024-2025 Mobility Metrics update*

SUMMARY-LEVEL VALUES

Expand Down Expand Up @@ -46,6 +46,7 @@ repository folder

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noting that the "data" folder does not exist inside the Transportation folder, either add the data folder or instruct reviewers to create it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added instruction for users/reviewers to create this folder to use it

- htaindex2015_data_counties.csv
- htaindex2019_data_counties.csv
- htaindex2020_data_counties.csv

Import all the files (and/or combine into one file) with only the
relevant variables and years
Expand Down Expand Up @@ -110,6 +111,36 @@ transportation_cost_county_2019 <- transport_county_2019 %>%
select(state, county, blkgrps, population, households, t_80ami)
```

### 2020

```{r}
transport_county_2020 <- read_csv(here::here("06_neighborhoods",
"Transportation",
"data",
"htaindex2020_data_counties.csv"))


transport_county_2020 <- transport_county_2020 %>%
select(county, blkgrps, population, households, t_80ami)
```

create correct FIPS columns

```{r}
transport_county_2020 <- transport_county_2020 %>%
mutate(
state = substr(county, start = 2, stop = 3),
county = substr(county, start = 4, stop = 6)
)
```

Keep only variables of interest

```{r}
transportation_cost_county_2020 <- transport_county_2020 %>%
select(state, county, blkgrps, population, households, t_80ami)
```


Compare to our official county file to make sure we have all counties accounted for

Expand All @@ -125,9 +156,20 @@ counties_2015 <- counties %>%

counties_2019 <- counties %>%
filter(year == 2019)

counties_2020 <- counties %>%
filter(year == 2020)
```

The 2015 and 2019 files have the same number of observations (3134, down from 3142 due to removing the 8 CT counties). 2020 file has 3,143 for due to the Alaska county split. Checking that's the case below:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the above code chunk can you add a count argument so that the count of the dataframes prints? This would make it easier to see what you are stating here in the text.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!


```{r}
unique_to_2020 <- counties_2020 %>%
anti_join(counties_2015, by = c("county_name", "state"))
```

All files have same number of observations (3142) so no merging needed to account for missings!
But no data is MISSING, these represent accurate expectations based on each year, so no merging needed to account for missings.


## QC Checks

Expand Down Expand Up @@ -181,6 +223,7 @@ if (length(missing_indices) > 0) {

1 missing value: Loving County, TX (48301 FIPS).


County-Level Transportation Cost 2019

```{r}
Expand Down Expand Up @@ -225,7 +268,54 @@ if (length(missing_indices) > 0) {
}
```

No missing values for 2019.
No missing values for 2020.

County-Level Transportation Cost 2020

```{r}
ggplot(transportation_cost_county_2020, aes(x=t_80ami)) + geom_histogram(binwidth=10) + labs(y="number of counties", x="Annual Transit Cost for the Regional Moderate Income Household, 2020")
```

Look at summary stats
```{r}
summary(transportation_cost_county_2020$t_80ami)
```

Examine outliers
```{r}
transportation_cost_county_2020_outliers <- transportation_cost_county_2020 %>%
filter(t_80ami>100)
```

No weird outliers

Use stopifnot to check if all values in "transportation_cost_county_2020" are non-negative

```{r}
stopifnot(min(transportation_cost_county_2020$t_80ami, na.rm = TRUE) >= 0)
```

Good to go.

Find indices of missing values for the "transit_cost_80ami" variable

```{r}
missing_indices <- which(is.na(transportation_cost_county_2020$t_80ami))
```

Print observations with missing values

```{r}
if (length(missing_indices) > 0) {
cat("Observations with missing values for transit_cost_80ami:\n")
print(transportation_cost_county_2020[missing_indices, , drop = FALSE])
} else {
cat("No missing values for transportation_cost_county_2020\n")
}
```

No missing values for 2020.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I ran this I am seeing there is one missing value: State 48, County 243

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! Thank you for catching - 1 missing value for 2020



## Data quality marker

Expand All @@ -234,6 +324,7 @@ Determine data quality cutoffs based on number of observations (all at the HH le
```{r}
summary(transportation_cost_county_2015$households)
summary(transportation_cost_county_2019$households)
summary(transportation_cost_county_2020$households)
```

We use a 30 HH cutoff for Data Quality 3 for the ACS variables, so for the sake of consistency, since none of these are less than 30 (all minimum values are at least 30 HHs), Data Quality can be 1 for all these observations BUT ALSO, rename all the metrics variables to what we had before (transit_trips & transit_cost), so we can name the quality variable appropriately
Expand All @@ -245,6 +336,9 @@ transportation_cost_county_2015 <- transportation_cost_county_2015 %>%
transportation_cost_county_2019 <- transportation_cost_county_2019 %>%
rename(transit_cost = t_80ami) %>%
mutate(transit_cost_quality = 1)
transportation_cost_county_2020 <- transportation_cost_county_2020 %>%
rename(transit_cost = t_80ami) %>%
mutate(transit_cost_quality = 1)
```

## Export files
Expand All @@ -267,12 +361,21 @@ transportation_cost_county_2019 <- transportation_cost_county_2019 %>%
)
```

Combine the two years into one overall files for both variables
```{r}
transportation_cost_county_2020 <- transportation_cost_county_2020 %>%
mutate(
year = 2020,
transit_cost = transit_cost/100
)
```

Combine the three years into one overall file for both variables

```{r}
transit_cost_county <- rbind(transportation_cost_county_2015, transportation_cost_county_2019)
transit_cost_county <- rbind(transportation_cost_county_2015, transportation_cost_county_2019, transportation_cost_county_2020)
```

Combined file has 9427 observations, which is correct (3142+3142+3143)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to above, I would recommend adding a count argument so the number of observations is printed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Keep variables of interest and order them appropriately also rename to correct var names
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be helpful to see the distriubtion of transit costs by county for all three years visualized together to observe similarity or movements.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added! Added commentary as well -- TLDR, the distributions are comparable, but costs increased from 2015 to 2019, and then decreased a lot in 2020 to below 2015 levels (which tracks, given this was the COVID year)


```{r}
Expand Down
Loading