Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Iss404 #441

Draft
wants to merge 7 commits into
base: version2025
Choose a base branch
from
Draft

Iss404 #441

wants to merge 7 commits into from

Conversation

kmartinchek
Copy link
Collaborator

Living Wage Ratio

Please include the following points in your PR:

  1. A link to the issue that this PR relates to. Iss064 #67

  2. A description of the content in this pull request.

  • What was changed? In this update, I created new Stata code to generate the living wage ratio from 2015 to 2022, both overall and by subgroup. This will be able to complete the backfill for this calculation on all prior years in two do files, instead of one file per year.
  • What should the reviewer be focusing on? compute_metrics_wage_ratio_overall.do and compute_metrics_wage_ratio_subgroup.do . These add new years and subgroup data for 2015 to 2022-- checking data is as expected, loops work properly, etc. would be excellent. I make manual adjustments for geography incompatibilities in 2022. We deflate/inflate MIT data for 2015 to 2022 due to data availability constraints and 2023 data will be added once it is available.
  • Is there a logical order to review the files in? Overall then subgroup file.
  1. Detail on any issues or flags that the metric reviewer/data-team should be aware of. None I am aware of. I plan to combine the overall and subgroup files possibly into one do file for efficiency, but that is a matter of putting the two scripts into one script.

cdsolari and others added 6 commits December 3, 2024 11:30
@cdsolari cdsolari requested review from rpitingolo and tinatinc and removed request for rpitingolo December 31, 2024 15:00
Comment on lines +22 to +23
global raw "C:\Users\KMartinchek\Documents\upward-mobility-2025\mobility-from-poverty\09_employment"
global wages "C:\Users\KMartinchek\Documents\upward-mobility-2025\mobility-from-poverty\09_employment"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this whole section, I think you should add in some instructions for anyone who will try to run this on their own computer. e.g., replace these pathways with your own local paths, etc.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line 21 states "update these directories" -- wondering what additional detail would be helpful, besides adding at the end , "for those running on their own computer, replace these file paths with your own local paths"

Comment on lines +31 to +32
Before running, please download the files below from the following [Box folder] https://urbanorg.app.box.com/folder/298586735341 into the repository folder 

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Be specific - which folder do you want them importing it into? Do they have to create the folder? Is that folder included in the gitignore?

Comment on lines +88 to +107
foreach year in 2015 2016 2017 2018 2019 2020 2021 2022 {

clear
cd `raw'
import delimited using "`year'_data.csv", numericcols(14 15 16 17 18) varnames(1) clear

/* keep only county totals */
keep if areatype == "County" & ownership == "Total Covered"

keep st cnty annualaverageweeklywage annualaverageestablishmentcount

rename st state

rename cnty county

destring state, replace // make numeric for merging with MIT data

save "temp-qecw-`year'.dta", replace

}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm getting an error that 2020_data.csv not found

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no idea why this error exists. I looked into it all this morning, and the 2020_data.csv file is in the Box folder linked at the top of the file to download the files from: https://urbanorg.app.box.com/folder/298586735341. When I download those files into the 09_employment/ folder and run these lines they operate as expected. Did you download all of the data files in Box into the 09_employment/ folder? Maybe it would be helpful to get on a call to troubleshoot, because I'm not quite sure otherwise what is happening.

use "temp-merged-`year'.dta", clear

/* convert living hourly wage to weekly */
gen weekly_living_wage = wage_adj * 40
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

be specific for general audience - 40 because 40 working hours per week

// save out for merge
save "$wages/temp-qecw-industry-`year'.dta", replace

}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same issue here - 2020_data.csv not found


*** assert number of counties
di `year'
distinct state county, joint // to confirm there are the right number of counties in the dataset
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool - I didn't know this command/package existed

@tinatinc
Copy link
Contributor

Hey Kassie - this looks very strong! But I have not been able to do a thorough review beyond these comments. Could you check on this missing data file and let me know if I'm missing something before I re-run?

@cdsolari cdsolari self-assigned this Jan 16, 2025
@cdsolari cdsolari self-requested a review January 16, 2025 20:19
…nd revert to include only 2014 2018 2021 2022 and 2023 files
@kmartinchek
Copy link
Collaborator Author

Living Wage Ratio
UPDATED DETAILS FOR MOST RECENT UPDATE

Please include the following points in your PR:

A link to the issue that this PR relates to. #67

A description of the content in this pull request.

What was changed? In this update, I created new Stata code to generate the living wage ratio for 2014, 2018, 2021, 2022 and 2023 pursuant to our data use agreement, both overall and by subgroup. This will be able to complete the backfill for this calculation on all prior years in one do file, instead of several scripts per year in the original code.

What should the reviewer be focusing on? compute_metrics_wage_ratio.do. These add new years and subgroup data for 2014, 2018, 2021, 2022, and 2023-- checking data is as expected, loops work properly, etc. would be excellent. I make manual adjustments for geography incompatibilities in 2022 and 2023 to align with guidance on CT geographies. We deflate/inflate MIT data for 2014, 2018, and 2021 pursuant to our data use agreement. Question of whether we will make publicly available the MIT files for 2019, 2022, and 2023 we use to create the measure and the best place to store and read in these files to best align with the data use agreement-- this is a question for Claudia specifically.

Is there a logical order to review the files in? Only file to review.

Detail on any issues or flags that the metric reviewer/data-team should be aware of. None I am aware of.

Copy link
Collaborator

@cdsolari cdsolari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am going to pass this off to Rob Pitingolo to run the code. I couldn't as easily do a review because I couldn't find the final files. It also looks like the 2023 data aren't in the output yet, although the code is there. I'm happy to chat through this, Rob. Thanks!

replace wage_adj = wage * 251.107/255.657 if year == 2018
replace wage_adj = wage * 236.736/255.657 if year == 2014

replace wage_adj = wage * 270.970/255.657 if year == 2021
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kassie, are all of these years of jobs paying a living wage getting adjusted based on year 2019 living wage data? I see in the CPI-U, that 2019 is 255.657. I thought we had more years of living wage data that we were using to populate years 2014, 2018 and 2021. Can you confirm we only have 2019 for those 3 years? If so, can you add that to the comment?

And, I believe we should have 2022 living wage data for the 2022 metric. I'll look for that later in your code.

/* only keep 1 adult, 2 children row */
keep if adults == "1 Adult" & children == "2 Children"

/* inflate/deflate MIT data, depending on year */
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do not need to inflate/deflate for 2022. Can you clean up this comment for 2022?

keep if areatype == "County"

** recode industries into MM categories
generate industry_type = .
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In looking at the 2018_data.csv, it looks like there's a NAICS of 102, which is "private, Service-providing" here, and I want to make sure we aren't missing any codes. Although, we can't really have all these years anymore anyway. Just make sure it works for 2014 and 2018.


/* delete unneeded files -- do this as a last step */

foreach year in 2014 2018 2021 2022 2023 {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume you haven't run this yet since this isn't finalized. We definitely need to clean up some of these extra files. If it helps, you can save some in the team Box folder, given that some of the data online might go away. But, please do remove this from github at the end.

bysort year: count

// export final dataset
export delimited using "$wages/living_wage_county_industry_longitudinal.csv", replace
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't tell where this file is. I think it is supposed to be the file for subgroups, and I don't see it in the folder. It would be helpful to save final files in a more obvious location, like here: https://github.com/UI-Research/mobility-from-poverty/tree/iss404/09_employment/data/final. I know this is just extra complicated because the backfilling we expected to do isn't happening. I also don't see the "wage_ratio_final_2023_subgroup.dta" file. Maybe this part hasn't been run yet? I do see it through 2022.

@cdsolari
Copy link
Collaborator

cdsolari commented Mar 4, 2025

Oh, @rpitingolo and @kmartinchek I remembered that I had not yet addressed this question: "Question of whether we will make publicly available the MIT files for 2019, 2022, and 2023 we use to create the measure and the best place to store and read in these files to best align with the data use agreement." We need to strip the MIT raw files for 2019, 2022, and 2023 from our github. This is technically public, and that violates our agreement. We should store these in our Box folder in the "Metrics_2025_round/living_wage" folder on Box. We also need to remove any of the past data from "scrape" efforts. I can, however, coordinate with @jwalsh28 to remove those. I know in R we use git ignore, and I noticed you use "erase" in STATA. I already have the MIT zip file they delivered to us in that Box folder. Let me know if you have other questions!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants