Skip to content

Update to icu4j 78.2 and Unicode 17#256

Open
mpilquist wants to merge 18 commits intomainfrom
topic/update-icu4-78.2
Open

Update to icu4j 78.2 and Unicode 17#256
mpilquist wants to merge 18 commits intomainfrom
topic/update-icu4-78.2

Conversation

@mpilquist
Copy link
Member

@mpilquist mpilquist commented Jan 11, 2026

Identified a few issues in the existing tests that this PR fixes:

  • CodePointMapper can return strings that are not canonicalized whereas icu4j, as part of further IDNA processing, returns canonicalized strings -- e.g., "궈ㄻ" is returned as code points [44424, 4529] whereas icu4j returns 44434. This was fixed by adding an additional NFC normalization in the test suite.
  • Differing normalization behaviors between java.text.Normalizer and icu4j when dealing with undefined unicode characters -- e.g., given the string "\u0360\u1ac6", Normalizer does not treat the undefined character \u1ac6 as a diacritic despite being in a block reserved for diacritics, whereas Normalizer2 does. This was fixed by adding a check for undefined characters when asserting equivalence.
  • Differences in handling modifiers without base characters when normalizing input -- e.g. given the string "\u0345\u20e5", NFC normalization reorders the code points to "\u20e5\u0345" which then results in a different input to the mapping step than what icu4j is given. This was added as an "inconsistency" check. This represents a whole class of such errors that can occur randomly with the Scalacheck input, but occur infrequently enough to not be a major nuisance.

This PR also addresses a few issues with the build structure:

  • Incremental compilation speed is improved by caching generated sources. Changing the unicode version or running a clean build will result in re-generation. This significantly improves compilation time and further allows incremental compilation without a network connection (b/c every codegen fetches mapping files from unicode.org).
  • Fixes unused import warnings when launching SBT.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant