Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unicode 14 update #139

Merged
merged 4 commits into from
Feb 7, 2022
Merged

Unicode 14 update #139

merged 4 commits into from
Feb 7, 2022

Conversation

n8willis
Copy link
Owner

This updates the character tables for the Arabic, Kannada, Mongolian, and Telugu docs to reflect additions in Unicode v 14, including new codepoints and the corresponding Indic Positional / Indic Syllabic / Arabic Shaping / general-UCD info.

I believe these are the only scripts affected by the updated release. Please speak up if I have overlooked something.

Note that for Arabic there is an entirely new block (Extended-B) and some additional Joining Groups.

I don't believe that there were major changes to the info on existing codepoints (the delta charts seem to reflect mostly representative glyph updates ...) but that is worth a separate pass anyway; new codepoints are (at least) self-contained and not likely to break existing implementations.

Note also that this update should be considered "raw" info. Several minor changes may have behavioral effects that will be discovered and sorted out by implementers. Will watch for such information from HarfBuzz and AllSorts, among others!

Of particular note in this respect is the fact that Kannada and Telugu have now acquired codepoints for a CONSONANT_DEAD letter, Nakaara Pollu. There is an existing issue on that letter, #116, which has so far received no comments. If it affects syllable-id or shaping, that will probably mean revision to the actual shaping docs for those scripts.

@wezm
Copy link

wezm commented Feb 3, 2022

I've done a Unicode 14 update to Allsorts. This mostly involved updating the various data used from the UCD as well as the following:

  • Update the list of Arabic chars that are modifier combining marks.
  • Update the shaping class according to your updates here.

Aside from this I've not made any behavioural changes to the shaping engine.

@n8willis
Copy link
Owner Author

n8willis commented Feb 4, 2022

* Update the list of Arabic chars that are modifier combining marks.

* Update the shaping class according to your updates here.

Great! Were there any surprises to be found in the Arabic MCM list? (I don't know; mostly I'm just curious if you called it out for some specific reason)

@n8willis n8willis merged commit 0359447 into master Feb 7, 2022
@n8willis n8willis deleted the unicode14 branch February 7, 2022 16:39
@n8willis
Copy link
Owner Author

n8willis commented Feb 7, 2022

This merge brings the data up to Unicode 14. The greatest number of changes are found in Arabic, though, so any users of that script-specific shaping info would be wise to look it over with extra scrutiny and, if something looks off, to open an issue.

@wezm
Copy link

wezm commented Feb 7, 2022

Were there any surprises to be found in the Arabic MCM list? (I don't know; mostly I'm just curious if you called it out for some specific reason)

I don't think so. I mentioned it because I noticed that there were new code points listed in https://www.unicode.org/reports/tr53/tr53-6.html#MCM which seemed to roughly coincide with the Unicode 14 release, so I bundled the change into my Unicode 14 updates.

@n8willis
Copy link
Owner Author

n8willis commented Feb 8, 2022

That makes sense. AFAICT, HarfBuzz hasn't yet added those MCM additions, but that might be related to the (known) mismatch between UTR#53 and HarfBuzz normalization. I should reread that.

@khaledhosny
Copy link

HarfBuzz hasn't yet added those MCM additions,

I think it is just an oversight, I made a PR to add them harfbuzz/harfbuzz#3422

@n8willis
Copy link
Owner Author

n8willis commented Feb 8, 2022

Okay; great. I had also noticed that the UTR53 update was published a while after the Unicode 14 PR, so that makes sense.

@n8willis
Copy link
Owner Author

n8willis commented Feb 8, 2022

Small change, so I pushed that directly in beffad8.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants