Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle ranges per guidelines #19

Merged
merged 69 commits into from
Aug 7, 2024
Merged

Conversation

i-be-snek
Copy link
Collaborator

@i-be-snek i-be-snek commented Jun 9, 2024

Introduces:

  • Ranges as discussed in the guidelines word document (copied directly from the document):

    • Greater than/more than/exceed 700: 701-799 (7 * scale + 1, (7+1) * scale – 1)
    • Greater/more than/over 640: 641-699 (6.4 * scale+1, (6+1) * scale – 1)
    • Less/lower/fewer than 700: 601-699 (6 * scale +1, (6+1) * scale – 1)
    • At least/a minimum of 630: 630-699 (6.3 * scale, (6+1) * scale – 1)
    • Up to 270: 200-270 (2 * scale – 2.7 * scale)
    • Approximately/around/nearly/about/almost/roughly/~ 700: ±5%: 665-735 for min and max (700 * 0.95, 700 * 1.05)
    • Dozens of, tens of, hundreds of, thousands of, etc: 2 * scale, 9 * scale; so, “thousands of injuries” becomes 2000-9000 injuries.
    • A number of/a group of/a few/several: 2-6
    • A few dozen: 24-72 (12 * 2, 12 * 6)
    • A dozen hundreds (if it ever appears!): 2400-7200 (12 * 2 * scale, 12 * 6*scale)
    • A few/several hundred/thousand/million, etc: 2 * scale-6 * scale; so “several millions” becomes 2000000-60000000
    • Many: 20-60
    • SPECIAL CASE: A couple/a couple hundred/thousand, etc...: 2 * scale – 3 * scale; so “a couple of deaths” becomes 2-3 deaths
    • SPECIAL CASE: If the number of human victims is reported as “family/families”, multiply by 5 to determine the number of human victims. For “families”, use 2 * 5-9 * 5 if there is no number before families. For example: “5 families are displaced” = 5 * 3-5 * 5 = 15-25 people displaced.
    • SPECIAL CASE: no causalities, no fatalities, no injuries, none, none reported, and similar phrases must be annotated as 0. If the information is missing, the annotator must enter NULL instead.
  • Other guidelines:

    • m/mil -> million and similar
    • expand "approximation" synonym list
  • Tests for normalizing numbers

@i-be-snek
Copy link
Collaborator Author

i-be-snek commented Jul 16, 2024

Note to self: add these as undesired location type matches:

[
"cemetery",
"church",
"orchard",
]

@i-be-snek i-be-snek changed the title WIP: Handle ranges per guidelines DRAFT: Handle ranges per guidelines Jul 18, 2024
@i-be-snek i-be-snek changed the title DRAFT: Handle ranges per guidelines Handle ranges per guidelines Aug 3, 2024
@i-be-snek i-be-snek changed the title Handle ranges per guidelines DRAFT: Handle ranges per guidelines Aug 6, 2024
@i-be-snek i-be-snek changed the title DRAFT: Handle ranges per guidelines Handle ranges per guidelines Aug 6, 2024
@i-be-snek i-be-snek requested a review from liniiiiii August 7, 2024 10:15
@liniiiiii
Copy link
Collaborator

Well checked for these normalization rules

@liniiiiii liniiiiii merged commit aec20ae into main Aug 7, 2024
1 check passed
@i-be-snek i-be-snek deleted the handle-ranges-per-guidelines branch January 17, 2025 13:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants