Skip to content

Commit 44a8cf0

Browse files
committed
UCA 16.0 delta 12
From Ken: I've done the code refactoring that enables the processing of more than one secondary decomposition in the input file, so we could get the full set of contractions working for the Kirat Rai vowel au. There was a very substantial refactoring of the code related to processContractions(). Instead of the prior one-off branch that checked for a secondary decomposition and handled it specially, the code now assumes that a decomposition in the input file is a comma-separated list which usually defaults to a single entry, if present. However, rather than refactoring the code so that it could handle indefinite lists of decompositions for contractions, I just have it work now with a static array of up to four decompositions. For over a decade, we've been able to get by with two. Kirat Rai forces us up to three. I expect it will take awhile to find a situation that requires four -- but if we do eventually need to go there, the code will continue to just work with no further updates. Doing it this way avoided even *more* extensive changes to the code to build up and tear down dynamic lists for this extremely edgy edge case.
1 parent a68eedc commit 44a8cf0

File tree

4 files changed

+221
-69
lines changed

4 files changed

+221
-69
lines changed

c/uca/sifter/unidata.txt

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
# unidata-16.0.0.txt
2-
# Date: 2023-10-09, 00:00:00 GMT [KW]
2+
# Date: 2023-10-23, 00:00:00 GMT [KW]
33
# © 2023 Unicode®, Inc.
44
# For terms of use, see https://www.unicode.org/terms_of_use.html
55
#
@@ -9,8 +9,8 @@
99
# Default Unicode Collation Element Table (DUCET) for
1010
# the Unicode Collation Algorithm.
1111
#
12-
# Version 16.0.0 draft 11 (Unicode Version: 16.0.0)
13-
# based on Unicode data file UnicodeData-16.0.0d8.txt
12+
# Version 16.0.0 draft 12 (Unicode Version: 16.0.0)
13+
# based on Unicode data file UnicodeData-16.0.0d10.txt
1414
# Ordering for Unicode 16.0
1515
#
1616
# Fields:
@@ -29280,7 +29280,7 @@ A6EF;BAMUM LETTER KOGHOM;Nl;;0;;;;
2928029280
10D4B;GARAY VOWEL SIGN I;Lo;;;;;;
2928129281
10D4C;GARAY VOWEL SIGN O;Lo;;;;;;
2928229282
10D4D;GARAY VOWEL SIGN EE;Lo;;;;;;
29283-
10D4E;GARAY VOWEL LENGTH MARK;Lo;;;;;;
29283+
10D4E;GARAY VOWEL LENGTH MARK;Lm;;;;;;
2928429284
10D4F;GARAY SUKUN;Lo;;;;;;
2928529285
10D69;GARAY VOWEL SIGN E;Mn;;;;;;
2928629286

@@ -34962,9 +34962,9 @@ CONTRACTION
3496234962
16D68;KIRAT RAI VOWEL SIGN AI;Lo;16D67 16D67;;;;;
3496334963
16D69;KIRAT RAI VOWEL SIGN O;Lo;16D63 16D67;;;;;
3496434964
# The vowel sign au has a complex decomposition that recurses.
34965-
# Add a secondary decomposition to 16D6A for canonical closure.
34965+
# Add two secondary decompositions to 16D6A for canonical closure.
3496634966
# 16D6A;KIRAT RAI VOWEL SIGN AU;Lo;16D69 16D67;;;;;
34967-
16D6A;KIRAT RAI VOWEL SIGN AU;Lo;16D69 16D67, 16D63 16D67 16D67;;;;;
34967+
16D6A;KIRAT RAI VOWEL SIGN AU;Lo;16D69 16D67, 16D63 16D68, 16D63 16D67 16D67;;;;;
3496834968

3496934969
DEFAULT
3497034970

0 commit comments

Comments
 (0)