Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

States with processed data #13

Open
ayyubibrahimi opened this issue Aug 2, 2024 · 8 comments
Open

States with processed data #13

ayyubibrahimi opened this issue Aug 2, 2024 · 8 comments

Comments

@ayyubibrahimi
Copy link
Owner

ayyubibrahimi commented Aug 2, 2024

States with processed/cleaned data:

All processed files should point to this processed directory

@stecklow
Copy link

stecklow commented Aug 7, 2024

We have new data from Illinois (as of last month) that looks pretty clean to me, and crucially, includes the reason for separation. Can we utilize this new IL file instead of the 2023 one? https://www.dropbox.com/scl/fi/y4rcehqxii8mmfcnjr6k9/EmploymentHistory.csv?rlkey=ieuv3x89ag5fh2pkin16fo2g2&dl=0

@ayyubibrahimi
Copy link
Owner Author

IL has been cleaned and added above. Can you add any new data to the unprocessed issue @stecklow? They'll likely always need to be processed, even if minimally.

@stecklow
Copy link

stecklow commented Sep 7, 2024

I went through the "national-post-db" Dropbox and have some questions and notes about some of the states.

Alaska - hold for launch?

Data is probably too incomplete for launch, not sure how many officers actually have history, too many other questions about immediate usefulness, and the state is now also being difficult with my update request so not even sure we can immediately promise anything

Florida - question

“Separation reason” column is more of an “employment status/change” column - remove “actively employed,” etc., or change name of column?

Georgia - question

“Separation reason” column is more of an “employment status/change” column - remove “actively employed,” etc., or change name of column?

Idaho - hold for launch?

Seems like columns reflect employment changes, and certifications, within an agency - hold for launch?

Illinois - no notes, but seems like a good model for the "reason for separation" column

Indiana - question

“Separation reason” column is more of an “employment status/change” column - remove “active,” or change name of column?

Iowa - question

I didn’t see Iowa in the launch data folder - which is fine, but it’s otherwise on my list, so just not sure what to list it as

Maryland - hold for launch?

The rows appear to show employment status changes, like promotions, often within agencies, rather than between - hold for launch?

North Carolina - hold for launch?

I don’t see any employment history in the index file

New Mexico - hold for launch?

Rows in data appear to show employment status or certification changes within individual departments - honestly really not sure what the index file shows

Oregon - hold for launch?

The rows appear to show employment status changes, like promotions, often within agencies, rather than between - hold for launch?

South Carolina - question

Some of the rows appear to show more employment status changes within departments, though other repeated departments appear to be different stints, which would track with our rule about contiguous service being collapsed

Tennessee - question

No notes, except to ask if things can not be all-caps (this is obviously least urgent)

Utah - question

No notes, except to ask if things can not be all-caps (this is obviously least urgent)

Vermont - hold for launch?

The rows appear to show employment status changes, like promotions, often within agencies, rather than between - hold for launch?

Washington - question

“Separation reason” column is more of an “employment status/change” column - remove “certified,” or change name of column? Also if things could not be all-caps

West Virginia - question

No notes, except to ask if things can not be all-caps (this is obviously least urgent)

Wyoming - question

Not sure “separation reason” is actually showing that - should we change the name of the column?

@ayyubibrahimi
Copy link
Owner Author

ayyubibrahimi commented Sep 7, 2024

  • This all makes sense to me. I named them all "separation reason" because of the list in this issue here "Reason for Separation" Breakdown #14. Can you suggest a different name for each of the columns?

  • All of the states that we're going to launch with still need to be normalized for casing, column names, date formatting, etc. This list is just comprised of states that I had to post-process from the BLN index files. Normalizing things according to our schema conventions is still on the list of things to do before launch.

  • For states like Oregon where the rows aren't collapsed, Tarak is still working on his open issue.

  • The Iowa data referenced in the States with unprocessed data  #12 is raw data from Ben that hasn't been cleaned, right?

  • AK and IN were not going to be on the launch list because they only have start_date values. I can remove the others you listed as well.

@ayyubibrahimi
Copy link
Owner Author

@stecklow to keep this issue focused just on what data has been processed so far, I just created a new issue here where I've rehashed your list above about what should be included in the launch #19 .

@stecklow
Copy link

stecklow commented Sep 7, 2024

  • This all makes sense to me. I named them all "separation reason" because of the list in this issue here "Reason for Separation" Breakdown #14. Can you suggest a different name for each of the columns?
  • All of the states that we're going to launch with still need to be normalized for casing, column names, date formatting, etc. This list is just comprised of states that I had to post-process from the BLN index files. Normalizing things according to our schema conventions is still on the list of things to do before launch.
  • For states like Oregon where the rows aren't collapsed, Tarak is still working on his open issue.
  • The Iowa data referenced in the States with unprocessed data  #12 is raw data from Ben that hasn't been cleaned, right?
  • AK and IN were not going to be on the launch list because they only have start_date values. I can remove the others you listed as well.

Thanks Ayyub. I don't know that I see @tarakc02's open issue about row collapsing, so I'm actually not sure if the issues with Idaho, Maryland, New Mexico, and Vermont are in the same boat as Oregon, and maybe should still be here. I made a longer response in #19 with some broader thoughts.

@tarakc02
Copy link
Collaborator

tarakc02 commented Sep 9, 2024

hi all, i just re-opened #6 based on this discussion, but that code is and has been ready to go!

@ayyubibrahimi
Copy link
Owner Author

Just added notes to #6 . Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants