-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Iss404 #441
base: version2025
Are you sure you want to change the base?
Iss404 #441
Conversation
… between 2015 and 2022 -- still need to fix some geo mismatches
… needed, and subgroup analysis is still in progress
… status notes on top of do files capturing to-do steps that are in-progress
global raw "C:\Users\KMartinchek\Documents\upward-mobility-2025\mobility-from-poverty\09_employment" | ||
global wages "C:\Users\KMartinchek\Documents\upward-mobility-2025\mobility-from-poverty\09_employment" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this whole section, I think you should add in some instructions for anyone who will try to run this on their own computer. e.g., replace these pathways with your own local paths, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line 21 states "update these directories" -- wondering what additional detail would be helpful, besides adding at the end , "for those running on their own computer, replace these file paths with your own local paths"
Before running, please download the files below from the following [Box folder] https://urbanorg.app.box.com/folder/298586735341 into the repository folder | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Be specific - which folder do you want them importing it into? Do they have to create the folder? Is that folder included in the gitignore?
foreach year in 2015 2016 2017 2018 2019 2020 2021 2022 { | ||
|
||
clear | ||
cd `raw' | ||
import delimited using "`year'_data.csv", numericcols(14 15 16 17 18) varnames(1) clear | ||
|
||
/* keep only county totals */ | ||
keep if areatype == "County" & ownership == "Total Covered" | ||
|
||
keep st cnty annualaverageweeklywage annualaverageestablishmentcount | ||
|
||
rename st state | ||
|
||
rename cnty county | ||
|
||
destring state, replace // make numeric for merging with MIT data | ||
|
||
save "temp-qecw-`year'.dta", replace | ||
|
||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm getting an error that 2020_data.csv not found
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no idea why this error exists. I looked into it all this morning, and the 2020_data.csv file is in the Box folder linked at the top of the file to download the files from: https://urbanorg.app.box.com/folder/298586735341. When I download those files into the 09_employment/ folder and run these lines they operate as expected. Did you download all of the data files in Box into the 09_employment/ folder? Maybe it would be helpful to get on a call to troubleshoot, because I'm not quite sure otherwise what is happening.
use "temp-merged-`year'.dta", clear | ||
|
||
/* convert living hourly wage to weekly */ | ||
gen weekly_living_wage = wage_adj * 40 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
be specific for general audience - 40 because 40 working hours per week
// save out for merge | ||
save "$wages/temp-qecw-industry-`year'.dta", replace | ||
|
||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same issue here - 2020_data.csv not found
|
||
*** assert number of counties | ||
di `year' | ||
distinct state county, joint // to confirm there are the right number of counties in the dataset |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool - I didn't know this command/package existed
Hey Kassie - this looks very strong! But I have not been able to do a thorough review beyond these comments. Could you check on this missing data file and let me know if I'm missing something before I re-run? |
…nd revert to include only 2014 2018 2021 2022 and 2023 files
Living Wage Ratio Please include the following points in your PR: A link to the issue that this PR relates to. #67 A description of the content in this pull request. What was changed? In this update, I created new Stata code to generate the living wage ratio for 2014, 2018, 2021, 2022 and 2023 pursuant to our data use agreement, both overall and by subgroup. This will be able to complete the backfill for this calculation on all prior years in one do file, instead of several scripts per year in the original code. What should the reviewer be focusing on? compute_metrics_wage_ratio.do. These add new years and subgroup data for 2014, 2018, 2021, 2022, and 2023-- checking data is as expected, loops work properly, etc. would be excellent. I make manual adjustments for geography incompatibilities in 2022 and 2023 to align with guidance on CT geographies. We deflate/inflate MIT data for 2014, 2018, and 2021 pursuant to our data use agreement. Question of whether we will make publicly available the MIT files for 2019, 2022, and 2023 we use to create the measure and the best place to store and read in these files to best align with the data use agreement-- this is a question for Claudia specifically. Is there a logical order to review the files in? Only file to review. Detail on any issues or flags that the metric reviewer/data-team should be aware of. None I am aware of. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am going to pass this off to Rob Pitingolo to run the code. I couldn't as easily do a review because I couldn't find the final files. It also looks like the 2023 data aren't in the output yet, although the code is there. I'm happy to chat through this, Rob. Thanks!
replace wage_adj = wage * 251.107/255.657 if year == 2018 | ||
replace wage_adj = wage * 236.736/255.657 if year == 2014 | ||
|
||
replace wage_adj = wage * 270.970/255.657 if year == 2021 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kassie, are all of these years of jobs paying a living wage getting adjusted based on year 2019 living wage data? I see in the CPI-U, that 2019 is 255.657. I thought we had more years of living wage data that we were using to populate years 2014, 2018 and 2021. Can you confirm we only have 2019 for those 3 years? If so, can you add that to the comment?
And, I believe we should have 2022 living wage data for the 2022 metric. I'll look for that later in your code.
/* only keep 1 adult, 2 children row */ | ||
keep if adults == "1 Adult" & children == "2 Children" | ||
|
||
/* inflate/deflate MIT data, depending on year */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do not need to inflate/deflate for 2022. Can you clean up this comment for 2022?
keep if areatype == "County" | ||
|
||
** recode industries into MM categories | ||
generate industry_type = . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In looking at the 2018_data.csv, it looks like there's a NAICS of 102, which is "private, Service-providing" here, and I want to make sure we aren't missing any codes. Although, we can't really have all these years anymore anyway. Just make sure it works for 2014 and 2018.
|
||
/* delete unneeded files -- do this as a last step */ | ||
|
||
foreach year in 2014 2018 2021 2022 2023 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume you haven't run this yet since this isn't finalized. We definitely need to clean up some of these extra files. If it helps, you can save some in the team Box folder, given that some of the data online might go away. But, please do remove this from github at the end.
bysort year: count | ||
|
||
// export final dataset | ||
export delimited using "$wages/living_wage_county_industry_longitudinal.csv", replace |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't tell where this file is. I think it is supposed to be the file for subgroups, and I don't see it in the folder. It would be helpful to save final files in a more obvious location, like here: https://github.com/UI-Research/mobility-from-poverty/tree/iss404/09_employment/data/final. I know this is just extra complicated because the backfilling we expected to do isn't happening. I also don't see the "wage_ratio_final_2023_subgroup.dta" file. Maybe this part hasn't been run yet? I do see it through 2022.
Oh, @rpitingolo and @kmartinchek I remembered that I had not yet addressed this question: "Question of whether we will make publicly available the MIT files for 2019, 2022, and 2023 we use to create the measure and the best place to store and read in these files to best align with the data use agreement." We need to strip the MIT raw files for 2019, 2022, and 2023 from our github. This is technically public, and that violates our agreement. We should store these in our Box folder in the "Metrics_2025_round/living_wage" folder on Box. We also need to remove any of the past data from "scrape" efforts. I can, however, coordinate with @jwalsh28 to remove those. I know in R we use git ignore, and I noticed you use "erase" in STATA. I already have the MIT zip file they delivered to us in that Box folder. Let me know if you have other questions! |
Living Wage Ratio
Please include the following points in your PR:
A link to the issue that this PR relates to. Iss064 #67
A description of the content in this pull request.