-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
process full run database (to do branch) #173
Comments
The progress so far (17/10/2024)
|
Here are my comments on the todo list. (1)
This could best be tackled at the stage where we insert data into the database (2)
I can take this part since it's related to the data type. I need to trace where this problem starts but have some good idea about that. This deserves its own branch. (3)
This should also be in (4)
This already has its own branch and has already been assigned to @i-be-snek though I want to re-iterate that it will take me some time because I have never worked with inflation and honestly I'm a bit allergic to anything related to the economy at large. I do not have an estimate. (5)
Already in its own branch #174 and already assigned to @i-be-snek. @liniiiiii are you available to review these as they drop? |
I will be available to check them!
在 2024-10-19 16:27:18,Shorouq ***@***.***> 写道:
Here are my comments on the todo list.
(1)
* [ ] some events have country1 or country2, they are not capturing real events
This could best be tackled at the stage where we insert data into the database insert_events.py. I think those events are from articles that are not climate articles, and they would mostly have NULL throughout. We need to find a precise way to drop them. This deserves its own branch.. You can assign this to me.
(2)
* [ ] The GID column is sometimes None but it should be []
I can take this part since it's related to the data type. I need to trace where this problem starts but have some good idea about that. This deserves its own branch.
(3)
* [ ] Only allow specific values of event types, anything outside that list should be removed before being placed into the database
This should also be in insert_events.py. This deserves its own branch, or to be merged with number 1 on this list (this is basically called 'validation').
(4)
* [ ] currency conversion and inflation adjusted for L1-L3
This already has its own branch and has already been assigned to @i-be-snek though I want to re-iterate that it will take me some time because I have never worked with inflation and honestly I'm a bit allergic to anything related to the economy at large. I do not have an estimate.
(5)
* [ ] the data gap, make sure all fields, L1>=L2>=L3
* Time, the *_Year in L1 should cover L2 and L3, eg, *_Year in L1 are 2020,2021, L2 and L3 could not have *_Year 2019,2023
* Location, the Admin_Areas in L1, should cover all Admin_Areas in L2, and Admin_Areas in L2 should cover all Admin_Area in L3
* impact values, the *_Min in L1 should be smaller or equal to sum of *_Min in L2, and the *_Max in L1 should be larger or equal to the sum of *_Max in L2. The *_Min in L2 of one Admin_Area (ignore the record where several countries have one value of impact) should be smaller or equal to sum of *_Min in the same Admin_Area in L3, and the The *_Max in L2 of one Admin_Area (ignore the record where several countries have one value of impact) should be larger or equal to sum of *_Max in the same Admin_Area in L3.
Already in its own branch #174 and already assigned to @i-be-snek.
@liniiiiii are you available to review these as they drop?
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
@i-be-snek, in the first task, is it possible to filter this outlier as shown in the screenshot? I suppose it's from the "NULL" response from the model. I found a record where the Administrative_Area_Norm is Bahamas, and the Location_Norm is t-c-null-dam. I manually checked this dam, which is in the US https://mapcarta.com/20531080. And in the full-run database, I saw 4175 t-c-null-dam records, which is not realistic. Thanks! |
I'll be working on this today. |
with the fullrun experiment finished, we have the raw parsed output. To make the database consistent, and provide a version that user can directly use, we will process the database with follows
some events have country1 or country2, they are not capturing real events (resolved in 173 filter invalid area names #184)
The GID column is sometimes None but it should be
[]
(resolved in GID missing in L3 #179)Only allow specific values of event types, anything outside that list should be removed before being placed into the database (resolved in 173 Validate categorical fields Main_Event and Hazards #181)
currency conversion and inflation adjusted for L1-L3 (resolved in
Currency Conversion and Inflation Adjustment #180Currency Convesion and Inflation Adjustment (db release) #191)captured "null" location (edit: added 20/oct/2024) -- may be a systematic error. (resolved in 173 null dam error #183)
the data gap, make sure all fields, L1>=L2>=L3 (resolved in Data Gap #174)
Time, the *_Year in L1 should cover L2 and L3, eg, *_Year in L1 are 2020,2021, L2 and L3 could not have *_Year 2019,2023
Location, the Admin_Areas in L1, should cover all Admin_Areas in L2, and Admin_Areas in L2 should cover all Admin_Area in L3
impact values, the *_Min in L1 should be smaller or equal to sum of *_Min in L2, and the *_Max in L1 should be larger or equal to the sum of *_Max in L2. The *_Min in L2 of one Admin_Area (ignore the record where several countries have one value of impact) should be smaller or equal to sum of *_Min in the same Admin_Area in L3, and the The *_Max in L2 of one Admin_Area (ignore the record where several countries have one value of impact) should be larger or equal to sum of *_Max in the same Admin_Area in L3.
The text was updated successfully, but these errors were encountered: