-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Updated place-level arrest file and prepared for first review #466
base: version2025
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @vpancini! I have added my review comments. Please let me know if you have any questions or concerns. Thank you!
ethnicity_of_arrestee | ||
) |> | ||
mutate( | ||
age_of_arrestee = as.numeric(age_of_arrestee) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This introduces around 2769 new NA values, do we know why? When I execute any(arrests_a$age_of_arrestee %in% c("NA", "na", "n/a", "N/A")) I get FALSE so not sure where the NAs are coming from.
ethnicity_of_arrestee | ||
) |> | ||
mutate( | ||
age_of_arrestee = as.numeric(age_of_arrestee) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This introduces around 3832 new NA values, do we know why? When I execute any(arrests_b$age_of_arrestee %in% c("NA", "na", "n/a", "N/A")) I get FALSE so not sure where the NAs are coming from.
|
||
Note that this metric includes the following subgroups by race, sex, and age subgroups, so those subgroups are also aggregated in this section: | ||
|
||
* Race: white, Black, Hispanic, Asian/other |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we please add a check which shows the unique race-ethnicity and sex values that exist in the original data before grouping?
group_by(year, ori) |> | ||
summarize( | ||
arr_total_juv = n(), | ||
# Race subgroups |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't feel very confident about the race subgroups because in the data sometimes race and ethnicity values aren't aligned, e.g., race is "white" but ethnicity is "unknown" instead "not hispanic or latino" or race is "unknown" but ethnicity is "hispanic or latino" etc. It would make sense to explicitly define these categories and how we want to code them for this metric outside of the code block before coding the race subgroup. Maybe it will make sense to use both the "race_of_arrestee" and "ethnicity_of_arrestee" variables to create the race-ethnicity subgroups.
|
||
|
||
## 4.1 Instructions to manually download NIBRS Batch Header Segment | ||
REVIEWER - the storage of these files has changed on Harvard Dataverse since I downloaded them and wrote this section, but there has been an error, and the Batch Header File section now lists the Group B Arrest Report Files. These instructions will need to be updated next year once Jacob Kaplan fixes the file storage. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update the instructions for this segment!
|
||
### 5.1a Check ACS variable names and identify those we need | ||
Load ACS variables for our first and last years. Manually explore each file and spot check several observations. The naming conventions of ACS variables do not seem to change during our time period. Note that if they did we would need to split up the code that reads in the years below. | ||
```{r check-acs-variables, eval = FALSE} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please briefly explain the ACS variables used.
* Sex: male, female | ||
* Age: ages 10-14, ages 15-17 | ||
|
||
Note that there is a Non-Hispanic white category, but not a Non-Hispanic category for other races (Black, Asian, etc.). This means that the categories will have a small but non-zero overlap. The race/ethnicity categories actually used in the `juvenile-arrests` metric are white, Black, Asian/other, and Hispanic. These are not mutually exclusive (e.g., someone could be counted as both Black and Hispanic), which matches the population denominators created here (this is because ACS doesn't have counts of non-Hispanic Black or other non-Hispanic races other than white). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add some test for this?
mutate( | ||
# Total age 10-17 | ||
pop_1017 = age_m_1014 + age_m_1517 + age_f_1014 + age_f_1517, | ||
# Race subgroups |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would make the coding clearer if we state what the race subgroups are encoding more explicitly maybe in a comment or text outside of the code block.
|
||
``` | ||
|
||
## 7. Validation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we see agency and place level distributions (visualization) of the crime data before any transformations are done?
``` | ||
|
||
|
||
## 8. Save and write out data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file should include a final data evaluation function before data is written out. Please refer the project Wiki on GitHub for more details. Also please make sure to include places that should be in the urban universe but we couldn't calculate statistics for, these should just have N/A values for the metric.
**Mobility metric pull request template
Please include the following points in your PR:**
A link to the issue that this PR relates to: #411
**2) A description of the content in this pull request.
What was changed?**
What should the reviewer be focusing on?
Is there a logical order to review the files in?
Rates for 2021 and 2022 may have changed because the underlying data have been updated