Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated place-level arrest file and prepared for first review #466

Open
wants to merge 1 commit into
base: version2025
Choose a base branch
from

Conversation

vpancini
Copy link
Collaborator

**Mobility metric pull request template

Please include the following points in your PR:**

A link to the issue that this PR relates to: #411
**2) A description of the content in this pull request.

What was changed?**

  • The 2022 metric update included two files: 01_agency_geo_place.qmd and juvenile-arrests-place-all.qmd. I integrated these files into one to make them more intuitive. The new file is called juvenile-arrests-place-combined.qmd.

What should the reviewer be focusing on?

  • The overall logic of the code and aggregating counts from the agency-level to the place-level

Is there a logical order to review the files in?

  • There is only one file
  1. Detail on any issues or flags that the metric reviewer/data-team should be aware of.
    Rates for 2021 and 2022 may have changed because the underlying data have been updated

Copy link

@ridhi96 ridhi96 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @vpancini! I have added my review comments. Please let me know if you have any questions or concerns. Thank you!

ethnicity_of_arrestee
) |>
mutate(
age_of_arrestee = as.numeric(age_of_arrestee)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This introduces around 2769 new NA values, do we know why? When I execute any(arrests_a$age_of_arrestee %in% c("NA", "na", "n/a", "N/A")) I get FALSE so not sure where the NAs are coming from.

ethnicity_of_arrestee
) |>
mutate(
age_of_arrestee = as.numeric(age_of_arrestee)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This introduces around 3832 new NA values, do we know why? When I execute any(arrests_b$age_of_arrestee %in% c("NA", "na", "n/a", "N/A")) I get FALSE so not sure where the NAs are coming from.


Note that this metric includes the following subgroups by race, sex, and age subgroups, so those subgroups are also aggregated in this section:

* Race: white, Black, Hispanic, Asian/other
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we please add a check which shows the unique race-ethnicity and sex values that exist in the original data before grouping?

group_by(year, ori) |>
summarize(
arr_total_juv = n(),
# Race subgroups
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't feel very confident about the race subgroups because in the data sometimes race and ethnicity values aren't aligned, e.g., race is "white" but ethnicity is "unknown" instead "not hispanic or latino" or race is "unknown" but ethnicity is "hispanic or latino" etc. It would make sense to explicitly define these categories and how we want to code them for this metric outside of the code block before coding the race subgroup. Maybe it will make sense to use both the "race_of_arrestee" and "ethnicity_of_arrestee" variables to create the race-ethnicity subgroups.



## 4.1 Instructions to manually download NIBRS Batch Header Segment
REVIEWER - the storage of these files has changed on Harvard Dataverse since I downloaded them and wrote this section, but there has been an error, and the Batch Header File section now lists the Group B Arrest Report Files. These instructions will need to be updated next year once Jacob Kaplan fixes the file storage.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update the instructions for this segment!


### 5.1a Check ACS variable names and identify those we need
Load ACS variables for our first and last years. Manually explore each file and spot check several observations. The naming conventions of ACS variables do not seem to change during our time period. Note that if they did we would need to split up the code that reads in the years below.
```{r check-acs-variables, eval = FALSE}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please briefly explain the ACS variables used.

* Sex: male, female
* Age: ages 10-14, ages 15-17

Note that there is a Non-Hispanic white category, but not a Non-Hispanic category for other races (Black, Asian, etc.). This means that the categories will have a small but non-zero overlap. The race/ethnicity categories actually used in the `juvenile-arrests` metric are white, Black, Asian/other, and Hispanic. These are not mutually exclusive (e.g., someone could be counted as both Black and Hispanic), which matches the population denominators created here (this is because ACS doesn't have counts of non-Hispanic Black or other non-Hispanic races other than white).
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add some test for this?

mutate(
# Total age 10-17
pop_1017 = age_m_1014 + age_m_1517 + age_f_1014 + age_f_1517,
# Race subgroups
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would make the coding clearer if we state what the race subgroups are encoding more explicitly maybe in a comment or text outside of the code block.


```

## 7. Validation
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we see agency and place level distributions (visualization) of the crime data before any transformations are done?

```


## 8. Save and write out data
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file should include a final data evaluation function before data is written out. Please refer the project Wiki on GitHub for more details. Also please make sure to include places that should be in the urban universe but we couldn't calculate statistics for, these should just have N/A values for the metric.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants