From 7cde1f8924e41596fcc128b0df8795c497c185ba Mon Sep 17 00:00:00 2001 From: Nathan Willis Date: Fri, 17 Sep 2021 17:32:30 +0100 Subject: [PATCH 1/4] Kannada: update to Unicode 14. --- character-tables/character-tables-kannada.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/character-tables/character-tables-kannada.md b/character-tables/character-tables-kannada.md index 157e8c3..c283df0 100644 --- a/character-tables/character-tables-kannada.md +++ b/character-tables/character-tables-kannada.md @@ -137,7 +137,7 @@ specific, script-aware behavior. |`U+0CDA` | _unassigned_ | | | | |`U+0CDB` | _unassigned_ | | | | |`U+0CDC` | _unassigned_ | | | | -|`U+0CDD` | _unassigned_ | | | | +|`U+0CDD` | Letter | CONSONANT_DEAD | _null_ | ೝ Nakaara Pollu | |`U+0CDE` | Letter | CONSONANT | _null_ | ೞ Fa | |`U+0CDF` | _unassigned_ | | | | | | | | | From d0cc1daec4c00807c2d1704a8e4aa2064b44e80c Mon Sep 17 00:00:00 2001 From: Nathan Willis Date: Fri, 17 Sep 2021 17:33:04 +0100 Subject: [PATCH 2/4] Telugu: update to Unicode 14. --- character-tables/character-tables-telugu.md | 4 ++-- opentype-shaping-telugu.md | 9 +++++---- 2 files changed, 7 insertions(+), 6 deletions(-) diff --git a/character-tables/character-tables-telugu.md b/character-tables/character-tables-telugu.md index 8454263..9f45d0e 100644 --- a/character-tables/character-tables-telugu.md +++ b/character-tables/character-tables-telugu.md @@ -102,7 +102,7 @@ specific, script-aware behavior. |`U+0C39` | Letter | CONSONANT | _null_ | హ Ha | |`U+0C3A` | _unassigned_ | | | | |`U+0C3B` | _unassigned_ | | | | -|`U+0C3C` | _unassigned_ | | | | +|`U+0C3C` | Mark [Mn] | NUKTA | BOTTOM_POSITION | ఼ Nukta | |`U+0C3D` | Letter | AVAGRAHA | _null_ | ఽ Avagraha | |`U+0C3E` | Mark [Mn] | VOWEL_DEPENDENT | TOP_POSITION | ా Sign Aa | |`U+0C3F` | Mark [Mn] | VOWEL_DEPENDENT | TOP_POSITION | ి Sign I | @@ -137,7 +137,7 @@ specific, script-aware behavior. |`U+0C5A` | Letter | CONSONANT | _null_ | ౚ Rrra | |`U+0C5B` | _unassigned_ | | | | |`U+0C5C` | _unassigned_ | | | | -|`U+0C5D` | _unassigned_ | | | | +|`U+0C5D` | Letter | CONSONANT_DEAD | _null_ | ౝ Nakaara Pollu | |`U+0C5E` | _unassigned_ | | | | |`U+0C5F` | _unassigned_ | | | | | | | | | diff --git a/opentype-shaping-telugu.md b/opentype-shaping-telugu.md index 1f56ff7..284b702 100644 --- a/opentype-shaping-telugu.md +++ b/opentype-shaping-telugu.md @@ -976,10 +976,11 @@ This order is canonical in Unicode and is required so that "_consonant_,Nukta" substitution rules from GSUB will be correctly matched later in the shaping process. -> Note: The Telugu block does not include a "Nukta" mark. However, -> there are reports of users using the "Nukta" from other Indic -> blocks, so shaping engines may encounter a "Nukta" in text runs, and -> should handle the situation gracefully. +> Note: Prior to Unicode version 14, the Telugu block did not include +> a "Nukta" mark. However, there are reports of users using the +> "Nukta" from other Indic blocks, so shaping engines may encounter a +> "Nukta" from other scripts in text runs, and should handle the +> situation gracefully. #### 2.5: Pre-base consonants #### From 6e41191298e58900c5ad2c066bcf5184bea27292 Mon Sep 17 00:00:00 2001 From: Nathan Willis Date: Fri, 17 Sep 2021 17:33:30 +0100 Subject: [PATCH 3/4] Mongolian: update to Unicode 14. --- character-tables/character-tables-mongolian.md | 2 +- opentype-shaping-mongolian.md | 7 ++++--- 2 files changed, 5 insertions(+), 4 deletions(-) diff --git a/character-tables/character-tables-mongolian.md b/character-tables/character-tables-mongolian.md index efa9c5d..f72bc16 100644 --- a/character-tables/character-tables-mongolian.md +++ b/character-tables/character-tables-mongolian.md @@ -61,7 +61,7 @@ treated differently during the mark-reordering stage. |`U+180C` | Mark [Mn] | TRANSPARENT | _null_ | _0_ | ᠌ Free Variation Selector Two | |`U+180D` | Mark [Mn] | TRANSPARENT | _null_ | _0_ | ᠍ Free Variation Selector Three | |`U+180E` | Formatting | NON_JOINING | _null_ | _0_ | ᠎ Mongolian Vowel Separator | -|`U+180F` | _unassigned_ | | | | | +|`U+180F` | Mark [Mn] | TRANSPARENT | _null_ | _0_ | ᠏ Free Variation Selector Four | | | | | | | |`U+1810` | Number | NON_JOINING | _null_ | _0_ | ᠐ Digit Zero | |`U+1811` | Number | NON_JOINING | _null_ | _0_ | ᠑ Digit One | diff --git a/opentype-shaping-mongolian.md b/opentype-shaping-mongolian.md index 4512885..bed4f91 100644 --- a/opentype-shaping-mongolian.md +++ b/opentype-shaping-mongolian.md @@ -86,9 +86,10 @@ context alone. To indicate the correct form, the text run can include a **free variation selector** immediately after the letter in -question. There are three free variation selectors in the Mongolian -block (FVS1, FVS2, and FVS3), although some letters have alternate -forms defined only for FVS1 or only for FVS1 and FVS2. +question. There are four free variation selectors in the Mongolian +block (FVS1, FVS2, FVS3, and FVS4), although some letters have +alternate forms defined only for a subset of the free variation +selectors. In addition, letters vary as to whether alternate forms exist for the isolated, initial, medial, or final position, or for several From 1c8761be93c53fe7650f8f2485e521b62a746948 Mon Sep 17 00:00:00 2001 From: Nathan Willis Date: Fri, 17 Sep 2021 17:33:58 +0100 Subject: [PATCH 4/4] Arabic: update to Unicode 14. --- character-tables/character-tables-arabic.md | 91 +++++++++++++++++---- opentype-shaping-arabic.md | 7 +- 2 files changed, 79 insertions(+), 19 deletions(-) diff --git a/character-tables/character-tables-arabic.md b/character-tables/character-tables-arabic.md index a3e0372..4c54089 100644 --- a/character-tables/character-tables-arabic.md +++ b/character-tables/character-tables-arabic.md @@ -8,6 +8,7 @@ This document lists the per-character shaping information needed to - [Arabic character table](#arabic-character-table) - [Arabic Supplement character table](#arabic-supplement-character-table) - [Arabic Extended-A character table](#arabic-extended-a-character-table) + - [Arabic Extended-B character table](#arabic-extended-b-character-table) - [Rumi Numeral Symbols character table](#rumi-numeral-symbols-character-table) - [Miscellaneous character table](#miscellaneous-character-table) @@ -78,7 +79,7 @@ treated differently during the mark-reordering stage. |`U+061A` | Mark [Mn] | TRANSPARENT | _null_ | 32 | ؚ Small Kasra | |`U+061B` | Punctuation | NON_JOINING | _null_ | _0_ | ؛ Semicolon | |`U+061C` | Other | TRANSPARENT | _null_ | _0_ | ؜ Arabic Letter Mark | -|`U+061D` | _unassigned_ | | | | | +|`U+061D` | Punctuation | NON_JOINING | _null_ | _0_ | ؝ End Of Text Mark | |`U+061E` | Punctuation | NON_JOINING | _null_ | _0_ | ؞ Triple Dot Punctuation Mark | |`U+061F` | Punctuation | NON_JOINING | _null_ | _0_ | ؟ Question Mark | | | | | | | @@ -406,7 +407,7 @@ treated differently during the mark-reordering stage. |`U+08B2` | Letter | RIGHT | REH | _0_ | ࢲ Reh With Dot And Inverted V Above | |`U+08B3` | Letter | DUAL | AIN | _0_ | ࢳ Ain With 3 Dots Below | |`U+08B4` | Letter | DUAL | KAF | _0_ | ࢴ Kaf With Dot Below | -|`U+08B5` | _unassigned_ | | | | | +|`U+08B5` | Letter | DUAL | QAF | _0_ | ࢵ Qaf With Dot Below | |`U+08B6` | Letter | DUAL | BEH | _0_ | ࢶ Beh With Meem Above | |`U+08B7` | Letter | DUAL | BEH | _0_ | ࢷ Dotless Beh With 3 Dots Below And Meem Above | |`U+08B8` | Letter | DUAL | BEH | _0_ | ࢸ Dotless Beh With Teh Above | @@ -426,18 +427,18 @@ treated differently during the mark-reordering stage. |`U+08C5` | Letter | DUAL | HAH | _0_ | ࣅ Jeem With 3 Dots Above | |`U+08C6` | Letter | DUAL | HAH | _0_ | ࣆ Jeem With 3 Dots Below | |`U+08C7` | Letter | DUAL | LAM | _0_ | ࣇ Lam With Small Arabic Tah Above | -|`U+08C8` | _unassigned_ | | | | | -|`U+08C9` | _unassigned_ | | | | | -|`U+08CA` | _unassigned_ | | | | | -|`U+08CB` | _unassigned_ | | | | | -|`U+08CC` | _unassigned_ | | | | | -|`U+08CD` | _unassigned_ | | | | | -|`U+08CE` | _unassigned_ | | | | | -|`U+08CF` | _unassigned_ | | | | | -| | | | | | -|`U+08D0` | _unassigned_ | | | | | -|`U+08D1` | _unassigned_ | | | | | -|`U+08D2` | _unassigned_ | | | | | +|`U+08C8` | Letter | DUAL | GAF | _0_ | ࣈ Graf | +|`U+08C9` | Letter modifier | TRANSPARENT | _null_ | _0_ | ࣉ Small Farsi Yeh | +|`U+08CA` | Mark [Mn] | TRANSPARENT | _null_ | 230 | ࣊ Small High Farsi Yeh | +|`U+08CB` | Mark [Mn] | TRANSPARENT | _null_ | 230 | ࣋ Small High Yeh Barree With Two Dots Below | +|`U+08CC` | Mark [Mn] | TRANSPARENT | _null_ | 230 | ࣌ Small High Word Sah | +|`U+08CD` | Mark [Mn] | TRANSPARENT | _null_ | 230 | ࣍ Small High Zah | +|`U+08CE` | Mark [Mn] | TRANSPARENT | _null_ | 230 | ࣎ Large Round Dot Above | +|`U+08CF` | Mark [Mn] | TRANSPARENT | _null_ | 220 | ࣏ Large Round Dot Below | +| | | | | | +|`U+08D0` | Mark [Mn] | TRANSPARENT | _null_ | 220 | ࣐ Sukun Below | +|`U+08D1` | Mark [Mn] | TRANSPARENT | _null_ | 220 | ࣑ Large Circle Below | +|`U+08D2` | Mark [Mn] | TRANSPARENT | _null_ | 220 | ࣒ Large Round Dot Inside Circle Below | |`U+08D3` | Mark [Mn] | TRANSPARENT | _null_ | 220 | ࣓ Small Low Waw | |`U+08D4` | Mark [Mn] | TRANSPARENT | _null_ | 230 | ࣔ Small High Word Ar-Rub | |`U+08D5` | Mark [Mn] | TRANSPARENT | _null_ | 230 | ࣕ Small High Sad | @@ -451,7 +452,7 @@ treated differently during the mark-reordering stage. |`U+08DD` | Mark [Mn] | TRANSPARENT | _null_ | 230 | ࣝ Small High Word Sakta | |`U+08DE` | Mark [Mn] | TRANSPARENT | _null_ | 230 | ࣞ Small High Word Qif | |`U+08DF` | Mark [Mn] | TRANSPARENT | _null_ | 230 | ࣟ Small High Word Waqfa | -| | | | | | +| | | | | | |`U+08E0` | Mark [Mn] | TRANSPARENT | _null_ | 230 | ࣠ Small High Footnote Marker | |`U+08E1` | Mark [Mn] | TRANSPARENT | _null_ | 230 | ࣡ Small High Sign Safha | |`U+08E2` | Other | NON_JOINING | _null_ | _0_ | ࣢ Disputed End Of Ayah | @@ -487,6 +488,64 @@ treated differently during the mark-reordering stage. |`U+08FF` | Mark [Mn] | TRANSPARENT | _null_ | 230 | ࣿ Mark Sideways Noon Ghunna | +## Arabic Extended-B character table ## + + +| Codepoint | Unicode category | Joining type | Joining group | Mark class | Glyph | +|:----------|:-----------------|:-------------|:---------------------|:-----------|-------------------------------------------------------| +|`U+0870` | Letter | RIGHT | ALEF | _0_ | ࡰ Alef With Attached Fatha | +|`U+0871` | Letter | RIGHT | ALEF | _0_ | ࡱ Alef With Attached Top Right Fatha | +|`U+0872` | Letter | RIGHT | ALEF | _0_ | ࡲ Alef With Right Middle Stroke | +|`U+0873` | Letter | RIGHT | ALEF | _0_ | ࡳ Alef With Left Middle Stroke | +|`U+0874` | Letter | RIGHT | ALEF | _0_ | ࡴ Alef With Attached Kasra | +|`U+0875` | Letter | RIGHT | ALEF | _0_ | ࡵ Alef With Attached Bottom Right Kasra | +|`U+0876` | Letter | RIGHT | ALEF | _0_ | ࡶ Alef With Attached Round Dot Above | +|`U+0877` | Letter | RIGHT | ALEF | _0_ | ࡷ Alef With Attached Right Round Dot | +|`U+0878` | Letter | RIGHT | ALEF | _0_ | ࡸ Alef With Attached Left Round Dot | +|`U+0879` | Letter | RIGHT | ALEF | _0_ | ࡹ Alef With Attached Round Dot Below | +|`U+087A` | Letter | RIGHT | ALEF | _0_ | ࡺ Alef With Dot Above | +|`U+087B` | Letter | RIGHT | ALEF | _0_ | ࡻ Alef With Attached Top Right Fatha And Dot Above| +|`U+087C` | Letter | RIGHT | ALEF | _0_ | ࡼ Alef With Right Middle Stroke And Dot Above | +|`U+087D` | Letter | RIGHT | ALEF | _0_ | ࡽ Alef With Attached Bottom Right Kasra And Dot Above| +|`U+087E` | Letter | RIGHT | ALEF | _0_ | ࡾ Alef With Attached Top Right Fatha And Left Ring| +|`U+087F` | Letter | RIGHT | ALEF | _0_ | ࡿ Alef With Right Middle Stroke And Left Ring | +| | | | | | +|`U+0880` | Letter | RIGHT | ALEF | _0_ | ࢀ Alef With Attached Bottom Right Kasra And Left Ring| +|`U+0881` | Letter | RIGHT | ALEF | _0_ | ࢁ Alef With Attached Right Hamza | +|`U+0882` | Letter | RIGHT | ALEF | _0_ | ࢂ Alef With Attached Left Hamza | +|`U+0883` | Letter modifier | JOIN_CAUSING | _null_ | _0_ | ࢃ Tatweel With Overstruck Hamza | +|`U+0884` | Letter modifier | JOIN_CAUSING | _null_ | _0_ | ࢄ Tatweel With Overstruck Waw | +|`U+0885` | Letter modifier | JOIN_CAUSING | _null_ | _0_ | ࢅ Tatweel With Two Dots Below | +|`U+0886` | Letter | DUAL | THIN_YEH | _0_ | ࢆ Thin Yeh | +|`U+0887` | Letter | NON_JOINING | _null_ | _0_ | ࢇ Baseline Round Dot | +|`U+0888` | Symbol | NON_JOINING | _null_ | _0_ | ࢈ Raised Round Dot | +|`U+0889` | Letter | DUAL | NOON | _0_ | ࢉ Noon With Inverted Small V | +|`U+088A` | Letter | DUAL | HAH | _0_ | ࢊ Hah With Inverted Small V Below | +|`U+088B` | Letter | DUAL | TAH | _0_ | ࢋ Tah With Dot Below | +|`U+088C` | Letter | DUAL | TAH | _0_ | ࢌ Tah With Three Dots Below | +|`U+088D` | Letter | DUAL | GAF | _0_ | ࢍ Keheh With Two Dots Vertically Below | +|`U+088E` | Letter | RIGHT | VERTICAL_TAIL | _0_ | ࢎ Vertical Tail | +|`U+088F` | _unassigned_ | | | | | +| | | | | | +|`U+0890` | Symbol | NON_JOINING | _null_ | _0_ | ࢐ Pound Mark Above | +|`U+0891` | Symbol | NON_JOINING | _null_ | _0_ | ࢑ Piastre Mark Above | +|`U+0892` | _unassigned_ | | | | | +|`U+0893` | _unassigned_ | | | | | +|`U+0894` | _unassigned_ | | | | | +|`U+0895` | _unassigned_ | | | | | +|`U+0896` | _unassigned_ | | | | | +|`U+0897` | _unassigned_ | | | | | +|`U+0898` | Mark [Mn] | TRANSPARENT | _null_ | 230 | ࢘ Small High Word Al-Juz | +|`U+0899` | Mark [Mn] | TRANSPARENT | _null_ | 220 | ࢙ Small Low Word Ishmaam | +|`U+089A` | Mark [Mn] | TRANSPARENT | _null_ | 220 | ࢚ Small Low Word Imaala | +|`U+089B` | Mark [Mn] | TRANSPARENT | _null_ | 220 | ࢛ Small Low Word Tasheel | +|`U+089C` | Mark [Mn] | TRANSPARENT | _null_ | 230 | ࢜ Madda Waajib | +|`U+089D` | Mark [Mn] | TRANSPARENT | _null_ | 230 | ࢝ Superscript Alef Mokhassas | +|`U+089E` | Mark [Mn] | TRANSPARENT | _null_ | 230 | ࢞ Doubled Madda | +|`U+089F` | Mark [Mn] | TRANSPARENT | _null_ | 230 | ࢟ Half Madda Over Madda | +| | | | | | + + ## Rumi Numeral Symbols character table ## | Codepoint | Unicode category | Joining type | Joining group | Mark class | Glyph | @@ -523,7 +582,7 @@ treated differently during the mark-reordering stage. |`U+10E7C` | Number | NON_JOINING | _null_ | _0_ | 𐹼 Fraction One Quarter | |`U+10E7D` | Number | NON_JOINING | _null_ | _0_ | 𐹽 Fraction One Third | |`U+10E7E` | Number | NON_JOINING | _null_ | _0_ | 𐹾 Fraction Two Thirds | -|`U+10E7F` | _unasigned_ | | | | | +|`U+10E7F` | _unassigned_ | | | | |