Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validate currency in schema #186

Merged
merged 7 commits into from
Nov 6, 2024
Merged

Validate currency in schema #186

merged 7 commits into from
Nov 6, 2024

Conversation

i-be-snek
Copy link
Collaborator

@i-be-snek i-be-snek commented Nov 4, 2024

This PR:

  • Adds validation to the database schema to only allow current ISO-4217 (https://en.wikipedia.org/wiki/ISO_4217) currencies into the database. Rows with a bad currency will be dropped from the database
  • Fixes syntax error when creating the database and schema-
  • For L1, to avoid losing the entire row, values with invalid currencies will have all damage/insured damage impact values set to None

To test this PR, try:

  • creating a new database poetry run python3 Database/schema/create_db.py -db pr_test.db, ensuring there are no errors thrown
  • inserting parsed data into the database to ensure it filters out invalid currencies

@i-be-snek i-be-snek changed the title WIP: Validate currency in schema Validate currency in schema Nov 4, 2024
@i-be-snek i-be-snek requested a review from liniiiiii November 4, 2024 14:59
@liniiiiii
Copy link
Collaborator

@i-be-snek, it seems like I do not have the geojson file or path, so I can't proceed with L2 inserting, for L1, it's working, and I found a record
RL7O2ZF has a currency of Vatu, which is not correct, it was filtered 👍
image

For L2 I run this :

poetry run python3 Database/insert_events.py -m "replace" -lvl l2 -f Database/output/ESSD_2024_V3_2/test -nid Database/tmp/geojson_no_dupl -db pr_test.db -t Instance_Per_Administrative_Areas_Damage

and I got this error

INFO: Pandarallel will run on 5 workers.
INFO: Pandarallel will use Memory file system to transfer data between the main process and workers.
Inserting l1 into pr_test.db: 100%|█████████████████████████████████████| 156/156 [00:08<00:00, 17.59it/s]
Inserting l2 into pr_test.db: 100%|███████████████████████████████████| 2031/2031 [01:21<00:00, 24.81it/s]
Inserting l3 into pr_test.db: 100%|███████████████████████████████████| 3388/3388 [02:41<00:00, 20.99it/s]
Files: 100%|███████████████████████████████████████████████████████████████| 3/3 [05:00<00:00, 100.15s/it]
Traceback (most recent call last):
  File "/home/nl/Wikimpacts/Database/insert_events.py", line 256, in <module>
    geojson_utils.store_non_english_nids()
    ^^^^^^^^^^^^^
NameError: name 'geojson_utils' is not defined. Did you mean: 'GeoJsonUtils'?

@i-be-snek
Copy link
Collaborator Author

@liniiiiii there was a problem with a variable, should be fixed now (the data gap PR has the best fix for this but we can do it like this now since it's not critical). Can you check now and merge it if it's good?

You can also try it with Database/insert_full_run.sh for a full run :D

Copy link
Collaborator

@liniiiiii liniiiiii left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Successfully insert the events from L1 to L3, and invalid currencies are excluded, approve to merge in main branch, thanks!

@liniiiiii liniiiiii merged commit 2e3bcb9 into main Nov 6, 2024
1 check passed
@i-be-snek i-be-snek deleted the validate-currency-in-schema branch January 17, 2025 13:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants