Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

173 null dam error #183

Merged
merged 9 commits into from
Oct 31, 2024
Merged

173 null dam error #183

merged 9 commits into from
Oct 31, 2024

Conversation

i-be-snek
Copy link
Collaborator

@i-be-snek i-be-snek commented Oct 29, 2024

This PR:

  • fixes the bug where some "NULL"s are interpreted as areas and found on OSM as "T C Null Dam" (https://www.openstreetmap.org/way/1095041600). The reason for the bug was that "NULL" values inside lists were not properly replaced by None or removed altogether.
  • small fixes for some columns that should return [] in case it's empty instead of None
  • infer encoding when storing parquet, which allows l2/l3 locations and areas to be stored as an array instead of bytes, requiring no literal evaluation to work

To test this PR, you can try:

  • parsing events of an existing file that used to contain "T C Null Dam entries". Example:

    poetry run python3 Database/parse_events.py \
    --raw_dir Database/raw/ESSD_2024_V3/dev \
    --filename wiki_dev_whole_infobox_20240729_70single_events_all_categories_V3_gpt-4o-2024-05-13_rawoutput_data_time_nums.json \
    --output_dir Database/output/173_null/dev \
    --event_levels l1,l2,l3
  • evaluating a file parsed after the changes in this PR

  • database insertion after parsing with the changes in this PR

@i-be-snek i-be-snek linked an issue Oct 29, 2024 that may be closed by this pull request
6 tasks
@i-be-snek i-be-snek changed the title WIP: 173 null dam error 173 null dam error Oct 29, 2024
@i-be-snek i-be-snek requested a review from liniiiiii October 29, 2024 19:19
@liniiiiii
Copy link
Collaborator

@i-be-snek , big thanks for the update, I found an issue with this is that after the changes, the Adminstrative_Area_Norm becomes None in the records of events where I found the T C Null Dam. The event 8Lbci8N and iAAUPM5 contain T C Null Dam in the original Specific_Instance_Per_Administrative_Area_Affected/0000.parquet file, and then I ran the command you provided in this branch, then I open the same file, and the T C Null Dam records are gone, but for these Adminstrative_Area_Norm/Geojson/GID all became None, not only for these two events, but for all events in this file, similarly for other L3 files, could you have a look? For L2 and L1 it's fine, thanks!
image
image

@i-be-snek
Copy link
Collaborator Author

Thanks @liniiiiii. I'm still a bit sick today bit this seems easy to fix. I'll also be pushing another PR soon to filter out things like "country1"/"location2", etc where the model is hallucinating the instructions.

@liniiiiii
Copy link
Collaborator

Thanks @liniiiiii. I'm still a bit sick today bit this seems easy to fix. I'll also be pushing another PR soon to filter out things like "country1"/"location2", etc where the model is hallucinating the instructions.

@i-be-snek , get better soon!!! I will help with code review! Thanks!

@i-be-snek
Copy link
Collaborator Author

@liniiiiii I re-ran the parsing and insertion from scratch on the file wiki_dev_whole_infobox_20240729_70single_events_all_categories_V3_gpt-4o-2024-05-13_rawoutput_data_time_nums.json and don't find any dates missing in l3.

Could you share the file you were inspecting on your end, or re-run the pipeline? I did add one commit that improves cleaning out "None" values from lists of locations/areas.

@i-be-snek
Copy link
Collaborator Author

i-be-snek commented Oct 30, 2024

I was able to replicate this, but only for this field.

image

I'll work on resolving it tomorrow, so no need to review this now.

@i-be-snek
Copy link
Collaborator Author

@liniiiiii it should be okay now :) please try

Copy link
Collaborator

@liniiiiii liniiiiii left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solved with both issues in the comments, TC NULL Dam and the None issue of Adminstritive area, approved to merge to main, thanks!
image

@liniiiiii liniiiiii merged commit ac1ba21 into main Oct 31, 2024
1 check passed
@i-be-snek i-be-snek deleted the 173-null-dam-error branch January 17, 2025 13:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

process full run database (to do branch)
2 participants