Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Eur db release #198

Merged
merged 7 commits into from
Jan 23, 2025
Merged

Eur db release #198

merged 7 commits into from
Jan 23, 2025

Conversation

liniiiiii
Copy link
Collaborator

@i-be-snek , could you help me with this pr,
I run this and get the EUR output,

poetry run python3 Database/EUR_convert.py -i Database/output/full_run_25_deduplicated_data_gap_inflation_adjustment_drop_missing_event_ids -o Database/output/full_run_25_deduplicated_data_gap_inflation_adjustment_drop_missing_event_ids_EUR 

But when I run the inserting, I got error below, the L2 and L3 inserting process is failed. But I check the structure for this insert_full_run, the output is in the same structure, so I'm a bit confused about this error, thanks!
command:

source Database/insert_full_run.sh Database/output/full_run_25_deduplicated_data_gap_inflation_adjustment_drop_missing_event_ids_EUR impact.v1.2_dg_filled_EUR.db tmp/files 

error:

04 15:06:35 INFO     Processing level l3: Specific_Instance_Per_Administrative_Area_Homeless...

database-insertion: 2024-12-04 15:06:35 INFO     Inserting l3...

Traceback (most recent call last):
  File "/home/nl/Wikimpacts/Database/insert_events.py", line 220, in <module>
    assert len(check_table) == 1, f"Table name {args.target_table} incorrect! Found {check_table} instead."
           ^^^^^^^^^^^^^^^^^^^^^
AssertionError: Table name Specific_Instance_Per_Administrative_Area_Homeless incorrect! Found [] instead.

@liniiiiii liniiiiii changed the title Eur db release WIP Eur db release Dec 4, 2024
@i-be-snek
Copy link
Collaborator

@liniiiiii how did you create the empty database? the one mentioned in your command:

impact.v1.2_dg_filled_EUR.db

@liniiiiii liniiiiii changed the title WIP Eur db release Eur db release Dec 5, 2024
@liniiiiii liniiiiii requested a review from i-be-snek December 5, 2024 17:19
@liniiiiii
Copy link
Collaborator Author

liniiiiii commented Dec 5, 2024

Hi @i-be-snek , thanks for the key comment, now the readme and the db file are updated, pls have a look, thanks!

@i-be-snek
Copy link
Collaborator

I'm picking this up now :D big thanks.

Copy link
Collaborator

@i-be-snek i-be-snek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are several things that need changing in this PR.

(1) the convert_to_EUR_yearly_avg function contains a larger number of surplus variables. The year and month and so on is used to determine the rate... but that function takes these variables in then never uses them except to log an error. That function needs to be re-written. I think the only thing you want to check, really, is that the year is 2024. If convert_to_EUR_yearly_avg really is only for the rate of 2024, then you need to make that explicit in its name... otherwise, it sounds like a function that gives you a choice or functionality of converting to EUR buy the yearly average rate but that is not true in this case.

(2) Since the aforementioned function does not assert that the year is 2024 (and not NaN, because many of those rows have no year since they were not convertable to USD), so it needs to be fixed and re-run.

(3) the naming of several functions is not great:

  • The function USD_to_EUR borrows heavily from normalize_row_USD yet its naming style diverges from the norm. This is bad for future developers because it's confusing naming.
  • The database name itself is confusing. It needs to follow our semantic versioning style (https://semver.org/). The database name hence needs to change from impactdb.v1.2.dg_filled_EUR.db to impactdb.v1.3.dg_filled.db. It's the same database with a "minor" change. In the README, both releases cannot be the LATEST. That's a tag reserved for the latest release only. You can say both releases are usable depending on which currency you prefer, but it cannot be called LATEST.

(4) There are no logs for the failed insertions. You need to find the insertion logs (usually they land in a directory called tmp, maybe you need to create it if it doesn't exist) and these need to end up in the correct directory in releases/

@liniiiiii
Copy link
Collaborator Author

@i-be-snek , I updated the pr following your suggestions, could you help to review it as soon as you are available, thanks!

Copy link
Collaborator

@i-be-snek i-be-snek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@liniiiiii I have not checked everything but could readily see some issues. Could you please take a look at the comments on the code that I just posted and provide some clarifications/modifications?

Database/EUR_convert.py Outdated Show resolved Hide resolved
Database/scr/normalize_currency.py Outdated Show resolved Hide resolved
Database/scr/normalize_currency.py Outdated Show resolved Hide resolved
Database/scr/normalize_currency.py Outdated Show resolved Hide resolved
@@ -18,8 +18,8 @@

| [releases/impactdb.v1.0.dg_filled.db](releases/impactdb.v1.0.dg_filled.db) | A **post-processed** database after applying a final layer of post-processing, excluding the handling of currencies and inflation adjustment (has missing validation rules) |
| [releases/impactdb.v1.1.dg_filled.db](releases/impactdb.v1.1.dg_filled.db) | A **post-processed** database after applying a final layer of post-processing, including the handling of currencies and inflation adjustment. In this release, end years of events are nullable. |
| [releases/impactdb.v1.2.dg_filled.db](releases/impactdb.v1.1.dg_filled.db) **LATEST** | A **post-processed** database that improves on [releases/impactdb.v1.1.dg_filled.db](releases/impactdb.v1.1.dg_filled.db) by removing events that have no L1/L2/L3 impacts (ie. all impact data in L1 is NULL) |
| [releases/impactdb.v1.2.dg_filled_EUR.db](releases/impactdb.v1.1.dg_filled.db) **LATEST** | A **post-processed** database that convert 2024 USD to 2024 EUR on [releases/impactdb.v1.2.dg_filled.db](releases/impactdb.v1.1.dg_filled.db) by using a constant conversion rate in 2024 |
| [releases/impactdb.v1.2.dg_filled.db](releases/impactdb.v1.1.dg_filled.db) **Lastest 2024 USD version** | A **post-processed** database that improves on [releases/impactdb.v1.1.dg_filled.db](releases/impactdb.v1.1.dg_filled.db) by removing events that have no L1/L2/L3 impacts (ie. all impact data in L1 is NULL), most currencies are converted to USD in the database and inflated to 2024 value. |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean "LATEST 2024 USD version"? "lastest" is not a valid word.

Also, this still takes you to releases/impactdb.v1.1.dg_filled.db. This isn' the right release, is it?

releases/README.md Outdated Show resolved Hide resolved
@liniiiiii
Copy link
Collaborator Author

@i-be-snek , I update the changes, if it's ok, we can make a release regarding to our V1, thanks!

@i-be-snek
Copy link
Collaborator

@liniiiiii Sorry but there are some outstanding issues still in the PR. Hope you could bear with me a bit here ☹️

  1. in releases/README.md, there are still some database releases that link to the wrong place. Please double check those!
  2. the function called Convert_USD_to_EUR is the one that should be renamed to normalize_row_USD_to_EUR because it's the one that normalizes a single row in pandas.
  3. The function currently named normalize_row_EUR_inflation_year as (as of 54d9c63) IS NOT the function that performs normalization on the row and hence it should be renamed. It should start with convert_to followed by signifiers that explain what the function does. This should be done in a style consistent with the very similar function convert_to_USD_yearly_avg.
  4. This comment # only provide EUR in inflation year in the database found in Database/scr/normalize_currency.py should be removed because it causes confusion. These are stand-alone functions that can be used to convert any thing into another thing and therefore it's not advisable to have comments related to the pipeline flow near them.

I think this is much better and removing the word "latest" is much clearer because the files we have now are for 2024 only!

Since none of the suggested changes modify the code in any way, I don't think it's necessary to re-run the conversion.
But please have these fixed and I'll approve it straight away 😄

@liniiiiii
Copy link
Collaborator Author

@liniiiiii Sorry but there are some outstanding issues still in the PR. Hope you could bear with me a bit here ☹️

  1. in releases/README.md, there are still some database releases that link to the wrong place. Please double check those!
  2. the function called Convert_USD_to_EUR is the one that should be renamed to normalize_row_USD_to_EUR because it's the one that normalizes a single row in pandas.
  3. The function currently named normalize_row_EUR_inflation_year as (as of 54d9c63) IS NOT the function that performs normalization on the row and hence it should be renamed. It should start with convert_to followed by signifiers that explain what the function does. This should be done in a style consistent with the very similar function convert_to_USD_yearly_avg.
  4. This comment # only provide EUR in inflation year in the database found in Database/scr/normalize_currency.py should be removed because it causes confusion. These are stand-alone functions that can be used to convert any thing into another thing and therefore it's not advisable to have comments related to the pipeline flow near them.

I think this is much better and removing the word "latest" is much clearer because the files we have now are for 2024 only!

Since none of the suggested changes modify the code in any way, I don't think it's necessary to re-run the conversion. But please have these fixed and I'll approve it straight away 😄

Thanks for the suggestions, I think they are clear and useful for the users, I updated them!

@liniiiiii liniiiiii merged commit 71052ad into main Jan 23, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants