Skip to content

Latest commit

 

History

History
326 lines (304 loc) · 33.7 KB

character-tables-tibetan.md

File metadata and controls

326 lines (304 loc) · 33.7 KB

Tibetan character tables

This document lists the per-character shaping information needed to shape Tibetan text.

Table of Contents

Tibetan character table

Tibetan glyphs should be classified as in the following table. Codepoints in the Tibetan block with no assigned meaning are designated as unassigned in the Unicode category column.

Assigned codepoints with a null in the Shaping class column evoke no special behavior from the shaping engine. Note that this does include some valid codepoints, such as currency marks, punctuation, and other symbols.

Note: the NUMBER and SYMBOL Shaping classes are important during syllable identification, but generally evoke no further special behavior during the rest of the shaping process.

The Mark-placement subclass column indicates mark-placement positioning for codepoints in the Mark category. Assigned, non-mark codepoints have a null in this column and evoke no special mark-placement behavior. Marks tagged with [Mn] in the Unicode category column are categorized as non-spacing; marks tagged with [Mc] are categorized as spacing-combining.

Some codepoints in the following table use a Shaping class that differs from the codepoint's Unicode General Category. The Shaping class takes precedence during OpenType shaping, as it captures more specific, script-aware behavior.

Codepoint Unicode category Shaping class Mark-placement subclass Glyph
U+0F00 Letter null null ༀ Syllable Om
U+0F01 Symbol SYMBOL null ༁ Gter Yig Mgo Truncated A
U+0F02 Symbol SYMBOL null ༂ Gter Yig Mgo -Um Rnam Bcad Ma
U+0F03 Symbol SYMBOL null ༃ Gter Yig Mgo -Um Gter Tsheg Ma
U+0F04 Punctuation null null ༄ Initial Yig Mgo Mdun Ma
U+0F05 Punctuation null null ༅ Closing Yig Mgo Sgab Ma
U+0F06 Punctuation null null ༆ Caret Yig Mgo Phur Shad Ma
U+0F07 Punctuation null null ༇ Yig Mgo Tsheg Shad Ma
U+0F08 Punctuation null null ༈ Sbrul Shad
U+0F09 Punctuation null null ༉ Bskur Yig Mgo
U+0F0A Punctuation null null ༊ Bka- Shog Yig Mgo
U+0F0B Punctuation null null ་ Intersyllabic Tsheg
U+0F0C Punctuation null null ༌ Delimiter Tsheg Bstar
U+0F0D Punctuation null null ། Shad
U+0F0E Punctuation null null ༎ Nyis Shad
U+0F0F Punctuation null null ༏ Tsheg Shad
U+0F10 Punctuation null null ༐ Nyis Tsheg Shad
U+0F11 Punctuation null null ༑ Rin Chen Spungs Shad
U+0F12 Punctuation null null ༒ Rgya Gram Shad
U+0F13 Symbol SYMBOL null ༓ Caret -Dzud Rtags Me Long Can
U+0F14 Punctuation null null ༔ Gter Tsheg
U+0F15 Symbol SYMBOL null ༕ Logotype Sign Chad Rtags
U+0F16 Symbol SYMBOL null ༖ Logotype Sign Lhag Rtags
U+0F17 Symbol SYMBOL null ༗ Astrological Sign Sgra Gcan -Char Rtags
U+0F18 Mark [Mn] VOWEL_DEPENDENT BOTTOM_POSITION ༘ Astrological Sign -Khyud Pa
U+0F19 Mark [Mn] VOWEL_DEPENDENT BOTTOM_POSITION ༙ Astrological Sign Sdong Tshugs
U+0F1A Symbol SYMBOL null ༚ Sign Rdel Dkar Gcig
U+0F1B Symbol SYMBOL null ༛ Sign Rdel Dkar Gnyis
U+0F1C Symbol SYMBOL null ༜ Sign Rdel Dkar Gsum
U+0F1D Symbol SYMBOL null ༝ Sign Rdel Nag Gcig
U+0F1E Symbol SYMBOL null ༞ Sign Rdel Nag Gnyis
U+0F1F Symbol SYMBOL null ༟ Sign Rdel Dkar Rdel Nag
U+0F20 Number NUMBER null ༠ Digit Zero
U+0F21 Number NUMBER null ༡ Digit One
U+0F22 Number NUMBER null ༢ Digit Two
U+0F23 Number NUMBER null ༣ Digit Three
U+0F24 Number NUMBER null ༤ Digit Four
U+0F25 Number NUMBER null ༥ Digit Five
U+0F26 Number NUMBER null ༦ Digit Six
U+0F27 Number NUMBER null ༧ Digit Seven
U+0F28 Number NUMBER null ༨ Digit Eight
U+0F29 Number NUMBER null ༩ Digit Nine
U+0F2A Number NUMBER null ༪ Digit Half One
U+0F2B Number NUMBER null ༫ Digit Half Two
U+0F2C Number NUMBER null ༬ Digit Half Three
U+0F2D Number NUMBER null ༭ Digit Half Four
U+0F2E Number NUMBER null ༮ Digit Half Five
U+0F2F Number NUMBER null ༯ Digit Half Six
U+0F30 Number NUMBER null ༰ Digit Half Seven
U+0F31 Number NUMBER null ༱ Digit Half Eight
U+0F32 Number NUMBER null ༲ Digit Half Nine
U+0F33 Number NUMBER null ༳ Digit Half Zero
U+0F34 Symbol SYMBOL null ༴ Bsdus Rtags
U+0F35 Mark [Mn] SYLLABLE_MODIFIER BOTTOM_POSITION ༵ Ngas Bzung Nyi Zla
U+0F36 Symbol SYMBOL null ༶ Caret -Dzud Rtags Bzhi Mig Can
U+0F37 Mark [Mn] SYLLABLE_MODIFIER BOTTOM_POSITION ༷ Ngas Bzung Sgor Rtags
U+0F38 Symbol SYMBOL null ༸ Che Mgo
U+0F39 Mark [Mn] NUKTA TOP_POSITION ༹ Tsa -Phru
U+0F3A Punctuation [Ps] null null ༺ Gug Rtags Gyon
U+0F3B Punctuation [Pe] null null ༻ Gug Rtags Gyas
U+0F3C Punctuation [Ps] null null ༼ Ang Khang Gyon
U+0F3D Punctuation [Pe] null null ༽ Ang Khang Gyas
U+0F3E Mark [Mc] VOWEL_DEPENDENT RIGHT_POSITION ༾ Sign Yar Tshes
U+0F3F Mark [Mc] VOWEL_DEPENDENT LEFT_POSITION ༿ Sign Mar Tshes
U+0F40 Letter CONSONANT null ཀ Ka
U+0F41 Letter CONSONANT null ཁ Kha
U+0F42 Letter CONSONANT null ག Ga
U+0F43 Letter CONSONANT null གྷ Gha
U+0F44 Letter CONSONANT null ང Nga
U+0F45 Letter CONSONANT null ཅ Ca
U+0F46 Letter CONSONANT null ཆ Cha
U+0F47 Letter CONSONANT null ཇ Ja
U+0F48 unassigned
U+0F49 Letter CONSONANT null ཉ Nya
U+0F4A Letter CONSONANT null ཊ Tta
U+0F4B Letter CONSONANT null ཋ Ttha
U+0F4C Letter CONSONANT null ཌ Dda
U+0F4D Letter CONSONANT null ཌྷ Ddha
U+0F4E Letter CONSONANT null ཎ Nna
U+0F4F Letter CONSONANT null ཏ Ta
U+0F50 Letter CONSONANT null ཐ Tha
U+0F51 Letter CONSONANT null ད Da
U+0F52 Letter CONSONANT null དྷ Dha
U+0F53 Letter CONSONANT null ན Na
U+0F54 Letter CONSONANT null པ Pa
U+0F55 Letter CONSONANT null ཕ Pha
U+0F56 Letter CONSONANT null བ Ba
U+0F57 Letter CONSONANT null བྷ Bha
U+0F58 Letter CONSONANT null མ Ma
U+0F59 Letter CONSONANT null ཙ Tsa
U+0F5A Letter CONSONANT null ཚ Tsha
U+0F5B Letter CONSONANT null ཛ Dza
U+0F5C Letter CONSONANT null ཛྷ Dzha
U+0F5D Letter CONSONANT null ཝ Wa
U+0F5E Letter CONSONANT null ཞ Zha
U+0F5F Letter CONSONANT null ཟ Za
U+0F60 Letter CONSONANT null འ -A
U+0F61 Letter CONSONANT null ཡ Ya
U+0F62 Letter CONSONANT null ར Ra
U+0F63 Letter CONSONANT null ལ La
U+0F64 Letter CONSONANT null ཤ Sha
U+0F65 Letter CONSONANT null ཥ Ssa
U+0F66 Letter CONSONANT null ས Sa
U+0F67 Letter CONSONANT null ཧ Ha
U+0F68 Letter CONSONANT null ཨ A
U+0F69 Letter CONSONANT null ཀྵ Kssa
U+0F6A Letter CONSONANT null ཪ Fixed-Form Ra
U+0F6B Letter CONSONANT null ཫ Kka
U+0F6C Letter CONSONANT null ཬ Rra
U+0F6D unassigned
U+0F6E unassigned
U+0F6F unassigned
U+0F70 unassigned
U+0F71 Mark [Mn] VOWEL_DEPENDENT BOTTOM_POSITION ཱ Sign Aa
U+0F72 Mark [Mn] VOWEL_DEPENDENT TOP_POSITION ི Sign I
U+0F73 Mark [Mn] VOWEL_DEPENDENT TOP_AND_BOTTOM_POSITION ཱི Sign Ii
U+0F74 Mark [Mn] VOWEL_DEPENDENT BOTTOM_POSITION ུ Sign U
U+0F75 Mark [Mn] VOWEL_DEPENDENT BOTTOM_POSITION ཱུ Sign Uu
U+0F76 Mark [Mn] VOWEL_DEPENDENT TOP_AND_BOTTOM_POSITION ྲྀ Sign Vocalic R
U+0F77 Mark [Mn] VOWEL_DEPENDENT TOP_AND_BOTTOM_POSITION ཷ Sign Vocalic Rr
U+0F78 Mark [Mn] VOWEL_DEPENDENT TOP_AND_BOTTOM_POSITION ླྀ Sign Vocalic L
U+0F79 Mark [Mn] VOWEL_DEPENDENT TOP_AND_BOTTOM_POSITION ཹ Sign Vocalic Ll
U+0F7A Mark [Mn] VOWEL_DEPENDENT TOP_POSITION ེ Sign E
U+0F7B Mark [Mn] VOWEL_DEPENDENT TOP_POSITION ཻ Sign Ee
U+0F7C Mark [Mn] VOWEL_DEPENDENT TOP_POSITION ོ Sign O
U+0F7D Mark [Mn] VOWEL_DEPENDENT TOP_POSITION ཽ Sign Oo
U+0F7E Mark [Mn] BINDU TOP_POSITION ཾ Sign Rjes Su Nga Ro
U+0F7F Mark [Mc] VISARGA RIGHT_POSITION ཿ Sign Rnam Bcad
U+0F80 Mark [Mn] VOWEL_DEPENDENT TOP_POSITION ྀ Sign Reversed I
U+0F81 Mark [Mn] VOWEL_DEPENDENT TOP_AND_BOTTOM_POSITION ཱྀ Sign Reversed Ii
U+0F82 Mark [Mn] BINDU TOP_POSITION ྂ Sign Nyi Zla Naa Da
U+0F83 Mark [Mn] BINDU TOP_POSITION ྃ Sign Sna Ldan
U+0F84 Mark [Mn] VIRAMA BOTTOM_POSITION ྄ Halanta
U+0F85 Punctuation AVAGRAHA null ྅ Paluta
U+0F86 Mark [Mn] TONE_MARKER TOP_POSITION ྆ Sign Lci Rtags
U+0F87 Mark [Mn] TONE_MARKER TOP_POSITION ྇ Sign Yang Rtags
U+0F88 Letter CONSONANT_HEAD null ྈ Sign Lce Tsa Can
U+0F89 Letter CONSONANT_HEAD null ྉ Sign Mchu Can
U+0F8A Letter CONSONANT_HEAD null ྊ Sign Gru Can Rgyings
U+0F8B Letter CONSONANT_HEAD null ྋ Sign Gru Med Rgyings
U+0F8C Letter CONSONANT_HEAD null ྌ Sign Inverted Mchu Can
U+0F8D Mark [Mn] CONSONANT_SUBJOINED BOTTOM_POSITION ྍ Subjoined Sign Lce Tsa Can
U+0F8E Mark [Mn] CONSONANT_SUBJOINED BOTTOM_POSITION ྎ Subjoined Sign Mchu Can
U+0F8F Mark [Mn] CONSONANT_SUBJOINED BOTTOM_POSITION ྏ Subjoined Sign Inverted Mchu Can
U+0F90 Mark [Mn] CONSONANT_SUBJOINED BOTTOM_POSITION ྐ Subjoined Ka
U+0F91 Mark [Mn] CONSONANT_SUBJOINED BOTTOM_POSITION ྑ Subjoined Kha
U+0F92 Mark [Mn] CONSONANT_SUBJOINED BOTTOM_POSITION ྒ Subjoined Ga
U+0F93 Mark [Mn] CONSONANT_SUBJOINED BOTTOM_POSITION ྒྷ Subjoined Gha
U+0F94 Mark [Mn] CONSONANT_SUBJOINED BOTTOM_POSITION ྔ Subjoined Nga
U+0F95 Mark [Mn] CONSONANT_SUBJOINED BOTTOM_POSITION ྕ Subjoined Ca
U+0F96 Mark [Mn] CONSONANT_SUBJOINED BOTTOM_POSITION ྖ Subjoined Cha
U+0F97 Mark [Mn] CONSONANT_SUBJOINED BOTTOM_POSITION ྗ Subjoined Ja
U+0F98 unassigned
U+0F99 Mark [Mn] CONSONANT_SUBJOINED BOTTOM_POSITION ྙ Subjoined Nya
U+0F9A Mark [Mn] CONSONANT_SUBJOINED BOTTOM_POSITION ྚ Subjoined Tta
U+0F9B Mark [Mn] CONSONANT_SUBJOINED BOTTOM_POSITION ྛ Subjoined Ttha
U+0F9C Mark [Mn] CONSONANT_SUBJOINED BOTTOM_POSITION ྜ Subjoined Dda
U+0F9D Mark [Mn] CONSONANT_SUBJOINED BOTTOM_POSITION ྜྷ Subjoined Ddha
U+0F9E Mark [Mn] CONSONANT_SUBJOINED BOTTOM_POSITION ྞ Subjoined Nna
U+0F9F Mark [Mn] CONSONANT_SUBJOINED BOTTOM_POSITION ྟ Subjoined Ta
U+0FA0 Mark [Mn] CONSONANT_SUBJOINED BOTTOM_POSITION ྠ Subjoined Tha
U+0FA1 Mark [Mn] CONSONANT_SUBJOINED BOTTOM_POSITION ྡ Subjoined Da
U+0FA2 Mark [Mn] CONSONANT_SUBJOINED BOTTOM_POSITION ྡྷ Subjoined Dha
U+0FA3 Mark [Mn] CONSONANT_SUBJOINED BOTTOM_POSITION ྣ Subjoined Na
U+0FA4 Mark [Mn] CONSONANT_SUBJOINED BOTTOM_POSITION ྤ Subjoined Pa
U+0FA5 Mark [Mn] CONSONANT_SUBJOINED BOTTOM_POSITION ྥ Subjoined Pha
U+0FA6 Mark [Mn] CONSONANT_SUBJOINED BOTTOM_POSITION ྦ Subjoined Ba
U+0FA7 Mark [Mn] CONSONANT_SUBJOINED BOTTOM_POSITION ྦྷ Subjoined Bha
U+0FA8 Mark [Mn] CONSONANT_SUBJOINED BOTTOM_POSITION ྨ Subjoined Ma
U+0FA9 Mark [Mn] CONSONANT_SUBJOINED BOTTOM_POSITION ྩ Subjoined Tsa
U+0FAA Mark [Mn] CONSONANT_SUBJOINED BOTTOM_POSITION ྪ Subjoined Tsha
U+0FAB Mark [Mn] CONSONANT_SUBJOINED BOTTOM_POSITION ྫ Subjoined Dza
U+0FAC Mark [Mn] CONSONANT_SUBJOINED BOTTOM_POSITION ྫྷ Subjoined Dzha
U+0FAD Mark [Mn] CONSONANT_SUBJOINED BOTTOM_POSITION ྭ Subjoined Wa
U+0FAE Mark [Mn] CONSONANT_SUBJOINED BOTTOM_POSITION ྮ Subjoined Zha
U+0FAF Mark [Mn] CONSONANT_SUBJOINED BOTTOM_POSITION ྯ Subjoined Za
U+0FB0 Mark [Mn] CONSONANT_SUBJOINED BOTTOM_POSITION ྰ Subjoined -A
U+0FB1 Mark [Mn] CONSONANT_SUBJOINED BOTTOM_POSITION ྱ Subjoined Ya
U+0FB2 Mark [Mn] CONSONANT_SUBJOINED BOTTOM_POSITION ྲ Subjoined Ra
U+0FB3 Mark [Mn] CONSONANT_SUBJOINED BOTTOM_POSITION ླ Subjoined La
U+0FB4 Mark [Mn] CONSONANT_SUBJOINED BOTTOM_POSITION ྴ Subjoined Sha
U+0FB5 Mark [Mn] CONSONANT_SUBJOINED BOTTOM_POSITION ྵ Subjoined Ssa
U+0FB6 Mark [Mn] CONSONANT_SUBJOINED BOTTOM_POSITION ྶ Subjoined Sa
U+0FB7 Mark [Mn] CONSONANT_SUBJOINED BOTTOM_POSITION ྷ Subjoined Ha
U+0FB8 Mark [Mn] CONSONANT_SUBJOINED BOTTOM_POSITION ྸ Subjoined A
U+0FB9 Mark [Mn] CONSONANT_SUBJOINED BOTTOM_POSITION ྐྵ Subjoined Kssa
U+0FBA Mark [Mn] CONSONANT_SUBJOINED BOTTOM_POSITION ྺ Subjoined Fixed-Form Wa
U+0FBB Mark [Mn] CONSONANT_SUBJOINED BOTTOM_POSITION ྻ Subjoined Fixed-Form Ya
U+0FBC Mark [Mn] CONSONANT_SUBJOINED BOTTOM_POSITION ྼ Subjoined Fixed-Form Ra
U+0FBD unassigned
U+0FBE Symbol SYMBOL null ྾ Ku Ru Kha
U+0FBF Symbol SYMBOL null ྿ Ku Ru Kha Bzhi Mig Can
U+0FC0 Symbol SYMBOL null ࿀ Cantillation Sign Heavy Beat
U+0FC1 Symbol SYMBOL null ࿁ Cantillation Sign Light Beat
U+0FC2 Symbol SYMBOL null ࿂ Cantillation Sign Cang Te-U
U+0FC3 Symbol SYMBOL null ࿃ Cantillation Sign Sbub -Chal
U+0FC4 Symbol SYMBOL null ࿄ Symbol Dril Bu
U+0FC5 Symbol SYMBOL null ࿅ Symbol Rdo Rje
U+0FC6 Mark [Mn] SYLLABLE_MODIFIER BOTTOM_POSITION ࿆ Symbol Padma Gdan
U+0FC7 Symbol SYMBOL null ࿇ Symbol Rdo Rje Rgya Gram
U+0FC8 Symbol SYMBOL null ࿈ Symbol Phur Pa
U+0FC9 Symbol SYMBOL null ࿉ Symbol Nor Bu
U+0FCA Symbol SYMBOL null ࿊ Symbol Nor Bu Nyis -Khyil
U+0FCB Symbol SYMBOL null ࿋ Symbol Nor Bu Gsum -Khyil
U+0FCC Symbol SYMBOL null ࿌ Symbol Nor Bu Bzhi -Khyil
U+0FCD unassigned
U+0FCE Symbol SYMBOL null ࿎ Sign Rdel Nag Rdel Dkar
U+0FCF Symbol SYMBOL null ࿏ Sign Rdel Nag Gsum
U+0FD0 Punctuation null null ࿐ Bska- Shog Gi Mgo Rgyan
U+0FD1 Punctuation null null ࿑ Mnyam Yig Gi Mgo Rgyan
U+0FD2 Punctuation null null ࿒ Nyis Tsheg
U+0FD3 Punctuation null null ࿓ Initial Brda Rnying Yig Mgo Mdun
U+0FD4 Punctuation null null ࿔ Closing Brda Rnying Yig Mgo Sgab
U+0FD5 Symbol SYMBOL null ࿕ Right-Facing Svasti Sign
U+0FD6 Symbol SYMBOL null ࿖ Left-Facing Svasti Sign
U+0FD7 Symbol SYMBOL null ࿗ Right-Facing Svasti Sign With Dots
U+0FD8 Symbol SYMBOL null ࿘ Left-Facing Svasti Sign With Dots
U+0FD9 Punctuation null null ࿙ Leading Mchan Rtags
U+0FDA Punctuation null null ࿚ Trailing Mchan Rtags
U+0FDB unassigned
U+0FDC unassigned
U+0FDD unassigned
U+0FDE unassigned
U+0FDF unassigned

Miscellaneous character table

Other important characters that may be encountered when shaping runs of Tibetan text include the dotted-circle placeholder (U+25CC), the zero-width joiner (U+200D) and zero-width non-joiner (U+200C), and the no-break space (U+00A0).

The dotted-circle placeholder is frequently used when displaying a dependent vowel (matra) or a combining mark in isolation. Real-world text syllables may also use other characters, such as hyphens or dashes, in a similar placeholder fashion; shaping engines should cope with this situation gracefully.

Codepoint Unicode category Shaping class Mark-placement subclass Glyph
U+00A0 Separator PLACEHOLDER null   No-break space
U+200C Other NON_JOINER null ‌ Zero-width non-joiner
U+200D Other JOINER null ‍ Zero-width joiner
U+25CC Symbol DOTTED_CIRCLE null ◌ Dotted circle
U+2638 Symbol SYMBOL null ☸ Wheel of Dharma

The zero-width joiner (ZWJ) is primarily used to prevent the formation of a conjunct from a "consonant,Halant,consonant" sequence. The sequence "consonant,Halant,ZWJ,consonant" blocks the formation of a conjunct between the two consonants.

Note, however, that the "consonant,Halant" subsequence in the above example may still trigger a half-forms feature. To prevent the application of the half-forms feature in addition to preventing the conjunct, the zero-width non-joiner (ZWNJ) must be used instead. The sequence "consonant,Halant,ZWNJ,consonant" should produce the first consonant in its standard form, followed by an explicit "Halant".

A secondary usage of the zero-width joiner is to prevent the formation of "Reph". An initial "Ra,Halant,ZWJ" sequence should not produce a "Reph", where an initial "Ra,Halant" sequence without the zero-width joiner otherwise would.

The no-break space (NBSP) is primarily used to display those codepoints that are defined as non-spacing (marks, dependent vowels (matras), below-base consonant forms, and post-base consonant forms) in an isolated context, as an alternative to displaying them superimposed on the dotted-circle placeholder. These sequences will match "NBSP,ZWJ,Halant,consonant", "NBSP,mark", or "NBSP,matra".