Update to icu4j 78.2 and Unicode 17 by mpilquist · Pull Request #256 · typelevel/idna4s

mpilquist · 2026-01-11T14:18:43Z

Identified a few issues in the existing tests that this PR fixes:

CodePointMapper can return strings that are not canonicalized whereas icu4j, as part of further IDNA processing, returns canonicalized strings -- e.g., "궈ㄻ" is returned as code points [44424, 4529] whereas icu4j returns 44434. This was fixed by adding an additional NFC normalization in the test suite.
Differing normalization behaviors between java.text.Normalizer and icu4j when dealing with undefined unicode characters -- e.g., given the string "\u0360\u1ac6", Normalizer does not treat the undefined character \u1ac6 as a diacritic despite being in a block reserved for diacritics, whereas Normalizer2 does. This was fixed by adding a check for undefined characters when asserting equivalence.
Differences in handling modifiers without base characters when normalizing input -- e.g. given the string "\u0345\u20e5", NFC normalization reorders the code points to "\u20e5\u0345" which then results in a different input to the mapping step than what icu4j is given. This was added as an "inconsistency" check. This represents a whole class of such errors that can occur randomly with the Scalacheck input, but occur infrequently enough to not be a major nuisance.

This PR also addresses a few issues with the build structure:

Incremental compilation speed is improved by caching generated sources. Changing the unicode version or running a clean build will result in re-generation. This significantly improves compilation time and further allows incremental compilation without a network connection (b/c every codegen fetches mapping files from unicode.org).
Fixes unused import warnings when launching SBT.

…on has changed

mpilquist added 12 commits January 11, 2026 08:57

Downgrade to icu4j 75.1

8053e8b

Scalafmt

a07ef98

Update to icu4j 78.2 and Unicode 17

2b2946e

Scalafmt

1491caf

Scalafix

7dc31cb

Downgrade to icu4j 73.2

613546a

Merge branch 'topic/downgrade-icu4j' into topic/update-icu4-78.2

6ca6138

Update icu4j consistency checks to account for output normalization

e60e7a4

Comments

84e8ea2

Scalafix

d803bff

Add more output

3306ef5

Add inconsistency check

0516d03

mpilquist requested review from armanbilge and isomarcte January 12, 2026 23:09

Add test case from #90

ad27bc4

mpilquist mentioned this pull request Jan 13, 2026

uts-46 mapping step, should agree with icu4j's uts46-mapping step failure #90

Open

mpilquist added 5 commits January 13, 2026 08:24

Modify code generation to only regenerate source if the unicode versi…

bdf2a8b

…on has changed

Scalafmt

d8696e6

Fix build warnings

85eaf3e

Configure scalafix for meta build to exclude unused warnings

77798d5

Remove unnecessary syntax changes

f7c1a9b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update to icu4j 78.2 and Unicode 17#256

Update to icu4j 78.2 and Unicode 17#256
mpilquist wants to merge 18 commits intomainfrom
topic/update-icu4-78.2

mpilquist commented Jan 11, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

mpilquist commented Jan 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mpilquist commented Jan 11, 2026 •

edited

Loading