Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update normalization of ranges to match new guidelines #87

Merged
merged 7 commits into from
Sep 5, 2024

Conversation

i-be-snek
Copy link
Collaborator

See #83

@i-be-snek i-be-snek linked an issue Aug 28, 2024 that may be closed by this pull request
3 tasks
@i-be-snek i-be-snek changed the title 🔨 Move some phrases from zero-type to unknown-type WIP: Update normalization of ranges to match new guidelines Aug 28, 2024
@i-be-snek
Copy link
Collaborator Author

Update:

The failed tests here are the range formulas we have not agreed on 100%. Once they are ready, this should be ready for review.

@i-be-snek
Copy link
Collaborator Author

i-be-snek commented Aug 29, 2024

@koffiworou @liniiiiii

I think this PR should be almost ready. There is two smalls thing I would like to bring to your attention:

  • we need to upload the final version in the annotation guidelines for range normalization, but I can see you are still working on other (unrelated) sections on it. Should we upload a version now with these information and then later upload the final version when it's complete? It would be in PDF

  • we have two interesting test cases that fail (in essence they both mean the same thing). Do you think these make sense? They could be an edge case that we could handle with an exception, or we can leave it as is if you don't think it's a problem.

    • less than 1 is normalized as (1, 1)... should it not be (0, 0) or at least (0, 1)?
    • no more than 1... same meaning

    image

    That's because you are using max() to return a minimum of 0 should the number be any smaller than 1.

                        if "under" in k:
                            inc = 0 if "inclusive" in k else 1
                            return (
                                max(1, n - scale - inc) * lower_mod,
                                max(1, n - inc) * upper_mod,
                            )

    but if you instead had max(0, n - scale - inc), then we get results that may be easier to interpret (at least I feel like they sound more natural)

    # if we use max(1, ...)
    ###############
    For N = 1, scale = 1
     over, inclusive (at least) 1 --> [1, 2]
     over, exclusive (more than) 1 --> [2, 3]
     under, inclusive (at most) 1 --> [1, 1]
     under, exclusive (less than) 1 --> [1, 1]
     about 1 --> [1, 2]
    
    
    # if we use max(0, ...)
    ###############
    For N = 1, scale = 1
     over, inclusive (at least) 1 --> [1, 2]
     over, exclusive (more than) 1 --> [2, 3]
     under, inclusive (at most) 1 --> [0, 1]
     under, exclusive (less than) 1 --> [0, 0]
     about 1 --> [0, 2]
    

Let me know if these issues are resolved with a comment + check in the checkbox :)

@liniiiiii
Copy link
Collaborator

@koffiworou @liniiiiii

I think this PR should be almost ready. There is two smalls thing I would like to bring to your attention:

  • we need to upload the final version in the annotation guidelines for range normalization, but I can see you are still working on other (unrelated) sections on it. Should we upload a version now with these information and then later upload the final version when it's complete? It would be in PDF

  • we have two interesting test cases that fail (in essence they both mean the same thing). Do you think these make sense? They could be an edge case that we could handle with an exception, or we can leave it as is if you don't think it's a problem.

    • less than 1 is normalized as (1, 1)... should it not be (0, 0) or at least (0, 1)?
    • no more than 1... same meaning

    image
    That's because you are using max() to return a minimum of 0 should the number be any smaller than 1.

                        if "under" in k:
                            inc = 0 if "inclusive" in k else 1
                            return (
                                max(1, n - scale - inc) * lower_mod,
                                max(1, n - inc) * upper_mod,
                            )

    but if you instead had max(0, n - scale - inc), then we get results that may be easier to interpret (at least I feel like they sound more natural)

    # if we use max(1, ...)
    ###############
    For N = 1, scale = 1
     over, inclusive (at least) 1 --> [1, 2]
     over, exclusive (more than) 1 --> [2, 3]
     under, inclusive (at most) 1 --> [1, 1]
     under, exclusive (less than) 1 --> [1, 1]
     about 1 --> [1, 2]
    
    
    # if we use max(0, ...)
    ###############
    For N = 1, scale = 1
     over, inclusive (at least) 1 --> [1, 2]
     over, exclusive (more than) 1 --> [2, 3]
     under, inclusive (at most) 1 --> [0, 1]
     under, exclusive (less than) 1 --> [0, 0]
     about 1 --> [0, 2]
    

Let me know if these issues are resolved with a comment + check in the checkbox :)

Hi @i-be-snek , for the 1st point, I think we need to upload the final version, and I think it's mostly done and I will double check the rule for NLP paper today and save it in PDF and upload, for others I think we can freeze. and for the second point, I don't think we will have this extreme case in our database, because the numerical data is about human or money, if no death, it will be 0 or none, instead of saying less than 1 person dead, and for monetary damage, it also would not say less than 1 euro damage @koffiworou , how do you think about this case?

@i-be-snek
Copy link
Collaborator Author

Hi @i-be-snek , for the 1st point, I think we need to upload the final version, and I think it's mostly done and I will double check the rule for NLP paper today and save it in PDF and upload, for others I think we can freeze. and for the second point, I don't think we will have this extreme case in our database, because the numerical data is about human or money, if no death, it will be 0 or none, instead of saying less than 1 person dead, and for monetary damage, it also would not say less than 1 euro damage @koffiworou , how do you think about this case?

Thanks. That's a good case you are making but what about extreme climate events where there is one potential death or injury but it's not confirmed? In that case, I see how the range would be (0, 1).

Of course I understand this is not a standard case, but I also feel like it makes more sense, logically, to say that "at most 1" means (0, 1) and not (1, 1).

@i-be-snek i-be-snek self-assigned this Aug 30, 2024
@i-be-snek
Copy link
Collaborator Author

@koffiworou any comments on the exceptional cases I listed here? Otherwise we are ready to squash+merge.

@koffiworou
Copy link
Collaborator

@i-be-snek , Sorry, I was not following on a real time the different comments. It is true that your example on " one potential death" suggests a narrative from someone who is not sure if there is one death, or zero, or more. If he was sure that there was at least one death, then he would have used the term "at least". So it makes sense to convert " one potential death" to the interval [0,1]. Again, this is one point of view :)

@i-be-snek i-be-snek changed the title WIP: Update normalization of ranges to match new guidelines Update normalization of ranges to match new guidelines Sep 5, 2024
@i-be-snek i-be-snek merged commit 880f27f into main Sep 5, 2024
1 check passed
@i-be-snek i-be-snek deleted the 83-guideline-update-to-number-ranges branch October 28, 2024 15:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Small changes to the annotation rules for numbers (affecting the code)
3 participants