Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Full stops after numbers unnoticed, extra ones predicted #9

Open
alexdiment opened this issue Dec 15, 2022 · 1 comment
Open

Full stops after numbers unnoticed, extra ones predicted #9

alexdiment opened this issue Dec 15, 2022 · 1 comment
Labels
enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed

Comments

@alexdiment
Copy link

Hi and thanks a lot for the great tool!

Seems that in the original punctuation removal step, punctuation in numbers is intentionally kept. Perhaps due to decimal point issues or ordinal number representation in some languages.

This, however, results in extra punctuation being predicted when a number is at the end of a sentence: 'The Answer to the Ultimate Question of Life, the Universe, and Everything is 42.' becomes 'The Answer to the Ultimate Question of Life, the Universe, and Everything is 42..'

Not sure what would be an elegant solution to this. The punctuation-stripping regex can't tell apart ordinal marks from sentence-final full-stops. Would be nice to trust the LM to predict all the punctuation, i.e., remove all of it in the pre-processing step.

@oliverguhr oliverguhr added enhancement New feature or request help wanted Extra attention is needed labels Mar 10, 2023
@oliverguhr
Copy link
Owner

oliverguhr commented Mar 10, 2023

Good catch @alexdiment.

The issue is, that the model cannot tell if 123 should be 1.23, 12.3 or 123. I wanted to avoid the case where the model messes with decimal points.
I would suggest a post-processing step, that ignores punctuation markers from the model if they are already present in the text.

It's a rather small improvement, but I have no time to implement it any time soon. So if someone could help out, it would greatly be appreciated.

@oliverguhr oliverguhr added the good first issue Good for newcomers label Mar 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants