-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Thai/Lao] Mark reordering and tone markers #125
Comments
Hmm; interesting. The MS docs do have a little classification list of their own for determining valid mark sequences as well, which is not quite the same as HarfBuzz's list. You're definitely right about the "tone marker" != "above-base Mn" thing; there are dependent-vowel marks, killers, and some other marks I'm less sure of. I'll see what clarification I can find. |
So, the classes in the MS docs allegedly define a four-level ordering (upward from the base), with only one mark permitted from each level. Unicode explicitly disagrees. From 16.1 (TUS 13, p 641):
On that specific example, the MS docs place U+0E4D in level 2 and U+0E48 in level 3, which would mean mai ek is always above nikhahit. For full dramatic irony, however, Unicode notes that "mai ek, nikhahit" is likely to be a typo in real-world text. As per harfbuzz/harfbuzz#1008 and a couple of other issues, HarfBuzz is taking the tactic that the docs do not seem to specify a full mark-reordering algorithm, so tracking compatibility with Uniscribe is the best available option. For comparison, what HarfBuzz reorders vs the chartable vs the four MS levels is this:
HarfBuzz also treats the corresponding Lao codepoints in the same fashion, however, I did note that using the Thai->Lao offset on those codepoints leaves one out (U+0EBB, 'Sign Mai Kon') and ropes in one undefined (U+0EC7), although that last bit hardly matters. It'd be easy to make a case for following HarfBuzz; might also be easy to make a case for mentioning the four-level model from MS in the same spot, but if that model is actually valid for the written language I'd want to open an issue to discuss it within HarfBuzz. Perhaps @mhosken could weigh in on whether adding more reordering as MS alludes to is worth it? From what I can tell, the visual-order approach of Thai makes the overhead of having the shaper do reordering less important. |
Our spec states the following (emphasis mine):
According to our character tables, Thai has four
TONE_MARKER
charactersU+0E48..U+0E4B
, and Lao also has four tone marker charactersU+0EC8..U+0ECB
.Some testing with Uniscribe, and some reading of HarfBuzz code has shown that this reordering is not just limited to tone markers, but rather, all abovebase marks.
(Note: Being unfamiliar with Thai/Lao, I am making the assumption that tone markers != abovebase marks.)
The text was updated successfully, but these errors were encountered: