Replies: 1 comment 1 reply
-
I think this should have been handled in one of the merges. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
There is an issue regarding to the json output from the GPT4o, which gives error for duplicate columns and "0" column names, still working on it 20240625
PS C:\Users\nili\OneDrive - Vrije Universiteit Brussel\PHD_VUB_TUD_Ni\PhD_project\Wikimpacts_GitHub\Wikimpacts> poetry run python Database/parse_events.py -r 'Database/raw/ESSD_2024/output_dictionaries_GPT4o' -f 'dict_2.json' -o 'Database/output/ESSD_2024/dev_wiki_gpt4o_0624_output.parquet' -t all
parse_events: 2024-06-25 16:23:23 INFO Passed args: Namespace(spaCy_model_name='en_core_web_trf', filename='dict_2.json', raw_path='Database/raw/ESSD_2024/output_dictionaries_GPT4o', output_path='Database/output/ESSD_2024/dev_wiki_gpt4o_0624_output.parquet', locale_config='en_US.UTF-8', event_type='all', country_column='Country', location_column='Location')
normalize-utils: 2024-06-25 16:23:26 INFO SpaCy model 'en_core_web_trf' has been loaded
normalize_locations: 2024-06-25 16:23:29 INFO Installed GeoPy cache
parse_events: 2024-06-25 16:23:29 INFO JSON datafile loaded
parse_events: 2024-06-25 16:23:29 INFO Total summary columns: ['Total_Summary_Homelessness', 'Total_Homelessness_Per_Country', 'Total_Summary_Injury', 'Total_Injury_Per_Country', 'Total_Summary_Death', 'Total_Death_Per_Country', 'Total_Summary_Insured_Damage', 'Total_Insured_Damage_Per_Country', 'Total_Summary_Displacement', 'Total_Displacement_Per_Country', 'Total_Summary_Damage', 'Total_Economic_Damage_Per_Country', 'Total_Summary_Building_Damage', 'Total_Building_Damage_Per_Country', 'Total_Summary_Affected', 'Total_Affected_Per_Country']
parse_events: 2024-06-25 16:23:29 INFO Normalizing dates
parse_events: 2024-06-25 16:23:29 INFO Normalizing booleans
parse_events: 2024-06-25 16:23:29 INFO Normalizing nulls
parse_events: 2024-06-25 16:23:29 INFO Normalizing numbers to ranges and determining whether or
not they are an approximate (min, max, approx). Columns: ['Total_Affected', 'Total_Building_Damage', 'Total_Economic_Damage', 'Total_Displacement', 'Total_Insured_Damage', 'Total_Deaths', 'Total_Injury', 'Total_Homelessness']
parse_events: 2024-06-25 16:23:29 INFO Normalizing Perils to list
parse_events: 2024-06-25 16:23:29 INFO Normalizing nulls
parse_events: 2024-06-25 16:23:29 INFO Converting annotation columns to strings to store in sqlite3
parse_events: 2024-06-25 16:23:29 INFO Converting list columns to strings to store in sqlite3
parse_events: 2024-06-25 16:23:29 INFO Storing parsed results
Traceback (most recent call last):
File "C:\Users\nili\OneDrive - Vrije Universiteit Brussel\PHD_VUB_TUD_Ni\PhD_project\Wikimpacts_GitHub\Wikimpacts\Database\parse_events.py", line 308, in
events.to_parquet(f"{events_filename}.parquet", engine="pyarrow")
File "C:\Users\nili\AppData\Local\pypoetry\Cache\virtualenvs\wikimpacts-lTUzaUwy-py3.11\Lib\site-packages\pandas\core\frame.py", line 2889, in to_parquet
return to_parquet(
^^^^^^^^^^^
File "C:\Users\nili\AppData\Local\pypoetry\Cache\virtualenvs\wikimpacts-lTUzaUwy-py3.11\Lib\site-packages\pandas\io\parquet.py", line 411, in to_parquet
impl.write(
File "C:\Users\nili\AppData\Local\pypoetry\Cache\virtualenvs\wikimpacts-lTUzaUwy-py3.11\Lib\site-packages\pandas\io\parquet.py", line 159, in write
table = self.api.Table.from_pandas(df, **from_pandas_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "pyarrow\table.pxi", line 3874, in pyarrow.lib.Table.from_pandas
File "C:\Users\nili\AppData\Local\pypoetry\Cache\virtualenvs\wikimpacts-lTUzaUwy-py3.11\Lib\site-packages\pyarrow\pandas_compat.py", line 570, in dataframe_to_arrays
convert_fields) = _get_columns_to_convert(df, schema, preserve_index,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\nili\AppData\Local\pypoetry\Cache\virtualenvs\wikimpacts-lTUzaUwy-py3.11\Lib\site-packages\pyarrow\pandas_compat.py", line 352, in _get_columns_to_convert
raise ValueError(
ValueError: Duplicate column names found: [0, 'Total_Affected', 'Total_Affected_Annotation', 0, 'Total_Building_Damage', 'Total_Building_Damage_Annotation', 0, 'Total_Economic_Damage', 'Total_Economic_Damage_Units', 'Total_Economic_Damage_Inflation_Adjusted', 'Total_Economic_Damage_Inflation_Adjusted_Year', 'Total_Economic_Damage_Assessment_with_annotation', 0, 'Total_Displacement', 'Total_Displacement_Annotation', 0, 'Total_Insured_Damage', 'Total_Insured_Damage_Units', 'Total_Insured_Damage_Inflation_Adjusted', 'Total_Insured_Damage_Inflation_Adjusted_Year', 'Total_Insured_Damage_Assessment_with_annotation', 0, 'Total_Deaths', 'Total_Death_Annotation', 0, 'Total_Injury', 'Total_Injury_Annotation', 0, 'Total_Homelessness', 'Total_Homelessness_Annotation', 'Event_ID', 'Source', 'Event_Name', 'Specific_Instance_Per_Country_Homelessness', 'Specific_Instance_Per_Country_Injury', 'Main_Event', 'Main_Event_Assessment_With_Annotation', 'Specific_Instance_Per_Country_Death', 'Start_Date', 'End_Date', 'Time_with_Annotation', 'Specific_Instance_Per_Country_Insured_Damage', 'Specific_Instance_Per_Country_Displacement', 'Location', 'Location_with_Annotation', 'Specific_Instance_Per_Country_Economic_Damage', 'Perils', 'Perils_Assessment_With_Annotation', 'Specific_Instance_Per_Country_Building_Damage', 'Specific_Instance_Per_Country_Affected', 'Start_Date_Day', 'Start_Date_Month', 'Start_Date_Year', 'End_Date_Day', 'End_Date_Month', 'End_Date_Year', 'Total_Affected_Min', 'Total_Affected_Max', 'Total_Affected_Approx', 'Total_Building_Damage_Min', 'Total_Building_Damage_Max', 'Total_Building_Damage_Approx', 'Total_Economic_Damage_Min', 'Total_Economic_Damage_Max', 'Total_Economic_Damage_Approx', 'Total_Displacement_Min', 'Total_Displacement_Max', 'Total_Displacement_Approx', 'Total_Insured_Damage_Min', 'Total_Insured_Damage_Max', 'Total_Insured_Damage_Approx', 'Total_Deaths_Min', 'Total_Deaths_Max', 'Total_Deaths_Approx', 'Total_Injury_Min', 'Total_Injury_Max', 'Total_Injury_Approx', 'Total_Homelessness_Min', 'Total_Homelessness_Max', 'Total_Homelessness_Approx']
Beta Was this translation helpful? Give feedback.
All reactions