-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
adding unicode friendly clue parsing #92
base: master
Are you sure you want to change the base?
adding unicode friendly clue parsing #92
Conversation
Changed the clue parsing algorithm to handle unicode characters. The regular expression it now mimics is "^\s*\a+(-\a+)?(\s*-?\s*\d+|\s*(-|\s+)inf)$" where "\a" represents all legal clue letters. Needed to avoid using any string library functions that allows matching using %d-style syntax when parsing a clue.
So this allows all letters allowed by %a in lua scripting, but also allows any unicode characters above 0x0370 except for some whitespace characters and dashes. Very versatile and still allows for all the same clue formats as before. |
If you're busy I could provide a fairly exhaustive list of test cases. Anything I can do to help u add this to the project? |
Fixed range of illegal characters
Make important character modifiers legal
Added compatibility with typing in foreign digit systems. It converts then to normal numbers before further processing. Fixed bug where you can put a hyphen at the beginning of a clue if there is whitespace before it. Added more whitespace characters.
Did some rigorous testing on it, found one flaw. Generated 2000 clues that should work and they did. Generated 5000 clues that shouldn't work and they didn't. This is ready. |
Lowercase works well with unicode characters
a455e14
to
1acaed9
Compare
This would be good to add pretty soon since you have so many foreign decks. Right now the characters it allows in clues is fairly arbitrary. If the character's code mod 256 is in the range of A-Z or a-z or À-ÿ then it accepts it, otherwise it rejects it. I've played a few games with it now and I think it's done. |
Here are 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to review getClueDetails
, but these are what I found for now.
@@ -232,6 +233,84 @@ analytics = | |||
sessions = {} | |||
} | |||
|
|||
----------[ Character sets ]---------- | |||
|
|||
digits_table = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are these split up into 10 tables?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The digits_table[0] table lists all the characters that are the number 0 in other languages.
The digits_table[1] table lists all the characters that are the number 1 in other languages.
etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean by that?
This reverts commit a514dce.
Fixed range of illegal characters. Do a small fix to prevent regex backtracking overflow on chat messages with large gaps of whitespace in them, such as "!A_______________B" where _ is a space.
Condensed the code for adding ranges of illegal characters. This is was checked and produces the exact same table as before.
Here are the submitted changes to the code. |
A correct greedy way to remove leading and trailing whitespace
Changed the clue parsing algorithm to handle unicode characters. The regular expression it now mimics is
^\s*\a+(-\a+)?(\s*-?\s*\d+|\s*(-|\s)\s*inf)\s*$
where "\a" represents all legal clue letters. I needed to avoid using any string library functions that allows matching using %d-style syntax when parsing a clue.