Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle ranges per guidelines #19

Merged
merged 69 commits into from
Aug 7, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
69 commits
Select commit Hold shift + click to select a range
0b56222
✅ Add tests for some normalize_number internal functions
i-be-snek Jun 8, 2024
ccc410b
♻️ Improve error handling and parsing for approx
i-be-snek Jun 8, 2024
912f792
♻️ Refactor currency cleaning
i-be-snek Jun 9, 2024
34412ca
♻️ Refactor rules for zero, unknown, and approx
i-be-snek Jun 9, 2024
33bedd0
🎨 Fix formatting
i-be-snek Jun 9, 2024
7b36ca4
♻️ Refactor structure
i-be-snek Jun 9, 2024
8dd70be
✅ Add test
i-be-snek Jun 9, 2024
d1ff22c
➕ Add iso4217 dependency
i-be-snek Jun 9, 2024
8f9dbb3
🙈 Ignore pytest files
i-be-snek Jun 9, 2024
e7e1e23
👷 Add tests to workflow
i-be-snek Jun 9, 2024
e460c99
💚 Fix trigger only when pushing to main
i-be-snek Jun 9, 2024
15e6017
➕ Add en_core_web spaCy model as dep
i-be-snek Jun 9, 2024
5bd873a
💡 Remove print statement
i-be-snek Jun 9, 2024
1c31d77
👷 Remove pytest options
i-be-snek Jun 9, 2024
eff875b
♻️ Fix deprecated regex patterns
i-be-snek Jun 9, 2024
92cbc31
💚 Fix CI test job name
i-be-snek Jun 9, 2024
2528a0e
🚨 Fix lint error
i-be-snek Jun 9, 2024
e14bc78
✅ Add corner cases
i-be-snek Jun 9, 2024
0d10bfa
✅ Test func to extact nums from spaCy entities (deprecated)
i-be-snek Jun 9, 2024
1e10245
🚨 Fix more lint
i-be-snek Jun 9, 2024
8f765fe
🐛 Fix not accepting int/float as input
i-be-snek Jun 9, 2024
47b1bd6
🐛 Fix missing space when normalizing 4B -> 4 billion
i-be-snek Jun 9, 2024
ba3d023
🚨 Fix more lint
i-be-snek Jun 9, 2024
5e2de15
🐛 Fix loose whitespace
i-be-snek Jun 9, 2024
861e583
♻️ Fix return types
i-be-snek Jun 9, 2024
d34db3f
✅ Update range extraction func
i-be-snek Jun 9, 2024
4c162e9
✅ Add test for approx quantifier extraction
i-be-snek Jun 9, 2024
9583a27
💡 Add note
i-be-snek Jun 9, 2024
698277e
✅ Update test for extract range func
i-be-snek Jun 9, 2024
2d70dbf
♻️ Detect text2num locale from locale config
i-be-snek Jun 9, 2024
ec29749
✨ Normalize special cases by guildelines v2
i-be-snek Jun 9, 2024
5290ecf
✅ Add test case
i-be-snek Jun 9, 2024
c8e143c
✅ Add more test cases
i-be-snek Jun 9, 2024
4834d87
🔧 More tests path to dir in root dir
i-be-snek Jul 31, 2024
1ea450e
💬 Update list of unwanted location types for location matching
i-be-snek Jul 31, 2024
328af01
♻️ Refactor synonym lists
i-be-snek Jul 31, 2024
bb5fdf2
🐛 Remove duplicate line
i-be-snek Jul 31, 2024
d0657a6
💬 Expand synonym lists
i-be-snek Jul 31, 2024
f6795fa
📌 Fix lock file
i-be-snek Jul 31, 2024
08985af
💚 Fix CI build (duplicate test workflow)
i-be-snek Jul 31, 2024
ae43313
✨ Handle special case 'many injuries/deaths'
i-be-snek Aug 1, 2024
9c69b6d
🚚 Rename method
i-be-snek Aug 2, 2024
da875e4
💬 Improve phrases to normalize text -> num
i-be-snek Aug 2, 2024
10e3137
✅ Add tests to handle 'family' modifiers
i-be-snek Aug 2, 2024
8c898ca
💬 Expand synonym lists
i-be-snek Aug 2, 2024
ccd0e35
🚨 Fix lint warnings - formatting
i-be-snek Aug 3, 2024
3ed354d
✨ Normalize complex ranges
i-be-snek Aug 3, 2024
5337b4e
✅ Add tests for converting pharses with single numbers into complex r…
i-be-snek Aug 3, 2024
3436988
✨ Add complex range normalization to number extraction method
i-be-snek Aug 3, 2024
95b95eb
✅ Update tests on number extraction (guidelines v2)
i-be-snek Aug 3, 2024
d97c8ea
🔥 Remove comments
i-be-snek Aug 4, 2024
f2b543e
♻️ Handle additional cases
i-be-snek Aug 4, 2024
b7f70ce
✅ Update test cases to handle signs and ranges with small numbers
i-be-snek Aug 4, 2024
f373402
🔥 Remove comments from source code
i-be-snek Aug 6, 2024
b279cbb
⚰️ Remove more comments from source code
i-be-snek Aug 6, 2024
8146cbe
💬 Fix literal
i-be-snek Aug 6, 2024
47b1814
🔊 Add logs
i-be-snek Aug 6, 2024
7776f5b
♻️ Handle a wider range of cases
i-be-snek Aug 6, 2024
1d36c6b
✅ Add tests to cover a wider variety of cases based on the dev set
i-be-snek Aug 6, 2024
2ba1409
♻️ Handle more simple ranges
i-be-snek Aug 6, 2024
416734b
✅ Add tests for more simple ranges
i-be-snek Aug 6, 2024
d30388c
🚚 Rename variable for consistency
i-be-snek Aug 6, 2024
86ff96e
🐛 Handle single numbers with locale sep (like 30,002,233)
i-be-snek Aug 6, 2024
2ecea82
⏪️ Revert author change
i-be-snek Aug 6, 2024
8d3dca1
📦️ Parse events with guidelines v2
i-be-snek Aug 6, 2024
be40f75
🐛 Fix author names
i-be-snek Aug 6, 2024
70d6397
✏️ Fix toml syntax error
i-be-snek Aug 6, 2024
e4a8b1f
✏️ Fix name for toml encoding
i-be-snek Aug 6, 2024
9ae3113
🚨 Fix toml syntax (again)
i-be-snek Aug 6, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
name: Run Unit Tests for Evaluation scripts via Pytest
name: Run Unit Tests via Pytest

on:
push:
Expand Down
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -20,3 +20,6 @@ results

# ignore geopy cache (used for normalizing locations faster)
geopy_cache.sqlite

# pytest
.pytest_*
4 changes: 1 addition & 3 deletions Database/fix_nested_json.py
Original file line number Diff line number Diff line change
@@ -1,10 +1,8 @@
import argparse
import pathlib

#from Database.scr.normalize_utils import Logging, NormalizeJsonOutput
# this one works tested by Ni 20240719
from scr.normalize_utils import Logging, NormalizeJsonOutput
#from .scr.normalize_utils import Logging, NormalizeJsonOutput

if __name__ == "__main__":
parser = argparse.ArgumentParser()
logger = Logging.get_logger("fix nested json sys output")
Expand Down
7 changes: 7 additions & 0 deletions Database/output/nlp4climate_guidelines_v2/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
### Annotation Guidelines v2 (changes in the normalization of numbers)

The files in this directory are the same gpt output from the [nlp4climate](Database/output/nlp4climate) experiments but with the normalization rules for a revised version of the annotation guidelines (v2).

The [jupyter notebook HTML export](guidelines_v1_versus_v2.html) shows a comparison with all items that underwent a change in the dev and test sets for these two categories: `Damage` and `Deaths`.

The annotation guidelines [can be found here](https://onedrive.live.com/personal/78d0e12ab2e8ce00/_layouts/15/doc2.aspx?resid=78D0E12AB2E8CE00!sb951b21f6a3b4408808a037df599c45d&cid=78d0e12ab2e8ce00&migratedtospo=true&app=Word) (you may need to request pormission to access the file).
Git LFS file not shown
Loading
Loading