This document lists the per-character shaping information needed to shape Tibetan text.
Table of Contents
Tibetan glyphs should be classified as in the following table. Codepoints in the Tibetan block with no assigned meaning are designated as unassigned in the Unicode category column.
Assigned codepoints with a null in the Shaping class column evoke no special behavior from the shaping engine. Note that this does include some valid codepoints, such as currency marks, punctuation, and other symbols.
Note: the
NUMBER
andSYMBOL
Shaping classes are important during syllable identification, but generally evoke no further special behavior during the rest of the shaping process.
The Mark-placement subclass column indicates mark-placement positioning for codepoints in the Mark category. Assigned, non-mark codepoints have a null in this column and evoke no special mark-placement behavior. Marks tagged with [Mn] in the Unicode category column are categorized as non-spacing; marks tagged with [Mc] are categorized as spacing-combining.
Some codepoints in the following table use a Shaping class that differs from the codepoint's Unicode General Category. The Shaping class takes precedence during OpenType shaping, as it captures more specific, script-aware behavior.
Codepoint | Unicode category | Shaping class | Mark-placement subclass | Glyph |
---|---|---|---|---|
U+0F00 |
Letter | null | null | ༀ Syllable Om |
U+0F01 |
Symbol | SYMBOL | null | ༁ Gter Yig Mgo Truncated A |
U+0F02 |
Symbol | SYMBOL | null | ༂ Gter Yig Mgo -Um Rnam Bcad Ma |
U+0F03 |
Symbol | SYMBOL | null | ༃ Gter Yig Mgo -Um Gter Tsheg Ma |
U+0F04 |
Punctuation | null | null | ༄ Initial Yig Mgo Mdun Ma |
U+0F05 |
Punctuation | null | null | ༅ Closing Yig Mgo Sgab Ma |
U+0F06 |
Punctuation | null | null | ༆ Caret Yig Mgo Phur Shad Ma |
U+0F07 |
Punctuation | null | null | ༇ Yig Mgo Tsheg Shad Ma |
U+0F08 |
Punctuation | null | null | ༈ Sbrul Shad |
U+0F09 |
Punctuation | null | null | ༉ Bskur Yig Mgo |
U+0F0A |
Punctuation | null | null | ༊ Bka- Shog Yig Mgo |
U+0F0B |
Punctuation | null | null | ་ Intersyllabic Tsheg |
U+0F0C |
Punctuation | null | null | ༌ Delimiter Tsheg Bstar |
U+0F0D |
Punctuation | null | null | ། Shad |
U+0F0E |
Punctuation | null | null | ༎ Nyis Shad |
U+0F0F |
Punctuation | null | null | ༏ Tsheg Shad |
U+0F10 |
Punctuation | null | null | ༐ Nyis Tsheg Shad |
U+0F11 |
Punctuation | null | null | ༑ Rin Chen Spungs Shad |
U+0F12 |
Punctuation | null | null | ༒ Rgya Gram Shad |
U+0F13 |
Symbol | SYMBOL | null | ༓ Caret -Dzud Rtags Me Long Can |
U+0F14 |
Punctuation | null | null | ༔ Gter Tsheg |
U+0F15 |
Symbol | SYMBOL | null | ༕ Logotype Sign Chad Rtags |
U+0F16 |
Symbol | SYMBOL | null | ༖ Logotype Sign Lhag Rtags |
U+0F17 |
Symbol | SYMBOL | null | ༗ Astrological Sign Sgra Gcan -Char Rtags |
U+0F18 |
Mark [Mn] | VOWEL_DEPENDENT | BOTTOM_POSITION | ༘ Astrological Sign -Khyud Pa |
U+0F19 |
Mark [Mn] | VOWEL_DEPENDENT | BOTTOM_POSITION | ༙ Astrological Sign Sdong Tshugs |
U+0F1A |
Symbol | SYMBOL | null | ༚ Sign Rdel Dkar Gcig |
U+0F1B |
Symbol | SYMBOL | null | ༛ Sign Rdel Dkar Gnyis |
U+0F1C |
Symbol | SYMBOL | null | ༜ Sign Rdel Dkar Gsum |
U+0F1D |
Symbol | SYMBOL | null | ༝ Sign Rdel Nag Gcig |
U+0F1E |
Symbol | SYMBOL | null | ༞ Sign Rdel Nag Gnyis |
U+0F1F |
Symbol | SYMBOL | null | ༟ Sign Rdel Dkar Rdel Nag |
U+0F20 |
Number | NUMBER | null | ༠ Digit Zero |
U+0F21 |
Number | NUMBER | null | ༡ Digit One |
U+0F22 |
Number | NUMBER | null | ༢ Digit Two |
U+0F23 |
Number | NUMBER | null | ༣ Digit Three |
U+0F24 |
Number | NUMBER | null | ༤ Digit Four |
U+0F25 |
Number | NUMBER | null | ༥ Digit Five |
U+0F26 |
Number | NUMBER | null | ༦ Digit Six |
U+0F27 |
Number | NUMBER | null | ༧ Digit Seven |
U+0F28 |
Number | NUMBER | null | ༨ Digit Eight |
U+0F29 |
Number | NUMBER | null | ༩ Digit Nine |
U+0F2A |
Number | NUMBER | null | ༪ Digit Half One |
U+0F2B |
Number | NUMBER | null | ༫ Digit Half Two |
U+0F2C |
Number | NUMBER | null | ༬ Digit Half Three |
U+0F2D |
Number | NUMBER | null | ༭ Digit Half Four |
U+0F2E |
Number | NUMBER | null | ༮ Digit Half Five |
U+0F2F |
Number | NUMBER | null | ༯ Digit Half Six |
U+0F30 |
Number | NUMBER | null | ༰ Digit Half Seven |
U+0F31 |
Number | NUMBER | null | ༱ Digit Half Eight |
U+0F32 |
Number | NUMBER | null | ༲ Digit Half Nine |
U+0F33 |
Number | NUMBER | null | ༳ Digit Half Zero |
U+0F34 |
Symbol | SYMBOL | null | ༴ Bsdus Rtags |
U+0F35 |
Mark [Mn] | SYLLABLE_MODIFIER | BOTTOM_POSITION | ༵ Ngas Bzung Nyi Zla |
U+0F36 |
Symbol | SYMBOL | null | ༶ Caret -Dzud Rtags Bzhi Mig Can |
U+0F37 |
Mark [Mn] | SYLLABLE_MODIFIER | BOTTOM_POSITION | ༷ Ngas Bzung Sgor Rtags |
U+0F38 |
Symbol | SYMBOL | null | ༸ Che Mgo |
U+0F39 |
Mark [Mn] | NUKTA | TOP_POSITION | ༹ Tsa -Phru |
U+0F3A |
Punctuation [Ps] | null | null | ༺ Gug Rtags Gyon |
U+0F3B |
Punctuation [Pe] | null | null | ༻ Gug Rtags Gyas |
U+0F3C |
Punctuation [Ps] | null | null | ༼ Ang Khang Gyon |
U+0F3D |
Punctuation [Pe] | null | null | ༽ Ang Khang Gyas |
U+0F3E |
Mark [Mc] | VOWEL_DEPENDENT | RIGHT_POSITION | ༾ Sign Yar Tshes |
U+0F3F |
Mark [Mc] | VOWEL_DEPENDENT | LEFT_POSITION | ༿ Sign Mar Tshes |
U+0F40 |
Letter | CONSONANT | null | ཀ Ka |
U+0F41 |
Letter | CONSONANT | null | ཁ Kha |
U+0F42 |
Letter | CONSONANT | null | ག Ga |
U+0F43 |
Letter | CONSONANT | null | གྷ Gha |
U+0F44 |
Letter | CONSONANT | null | ང Nga |
U+0F45 |
Letter | CONSONANT | null | ཅ Ca |
U+0F46 |
Letter | CONSONANT | null | ཆ Cha |
U+0F47 |
Letter | CONSONANT | null | ཇ Ja |
U+0F48 |
unassigned | |||
U+0F49 |
Letter | CONSONANT | null | ཉ Nya |
U+0F4A |
Letter | CONSONANT | null | ཊ Tta |
U+0F4B |
Letter | CONSONANT | null | ཋ Ttha |
U+0F4C |
Letter | CONSONANT | null | ཌ Dda |
U+0F4D |
Letter | CONSONANT | null | ཌྷ Ddha |
U+0F4E |
Letter | CONSONANT | null | ཎ Nna |
U+0F4F |
Letter | CONSONANT | null | ཏ Ta |
U+0F50 |
Letter | CONSONANT | null | ཐ Tha |
U+0F51 |
Letter | CONSONANT | null | ད Da |
U+0F52 |
Letter | CONSONANT | null | དྷ Dha |
U+0F53 |
Letter | CONSONANT | null | ན Na |
U+0F54 |
Letter | CONSONANT | null | པ Pa |
U+0F55 |
Letter | CONSONANT | null | ཕ Pha |
U+0F56 |
Letter | CONSONANT | null | བ Ba |
U+0F57 |
Letter | CONSONANT | null | བྷ Bha |
U+0F58 |
Letter | CONSONANT | null | མ Ma |
U+0F59 |
Letter | CONSONANT | null | ཙ Tsa |
U+0F5A |
Letter | CONSONANT | null | ཚ Tsha |
U+0F5B |
Letter | CONSONANT | null | ཛ Dza |
U+0F5C |
Letter | CONSONANT | null | ཛྷ Dzha |
U+0F5D |
Letter | CONSONANT | null | ཝ Wa |
U+0F5E |
Letter | CONSONANT | null | ཞ Zha |
U+0F5F |
Letter | CONSONANT | null | ཟ Za |
U+0F60 |
Letter | CONSONANT | null | འ -A |
U+0F61 |
Letter | CONSONANT | null | ཡ Ya |
U+0F62 |
Letter | CONSONANT | null | ར Ra |
U+0F63 |
Letter | CONSONANT | null | ལ La |
U+0F64 |
Letter | CONSONANT | null | ཤ Sha |
U+0F65 |
Letter | CONSONANT | null | ཥ Ssa |
U+0F66 |
Letter | CONSONANT | null | ས Sa |
U+0F67 |
Letter | CONSONANT | null | ཧ Ha |
U+0F68 |
Letter | CONSONANT | null | ཨ A |
U+0F69 |
Letter | CONSONANT | null | ཀྵ Kssa |
U+0F6A |
Letter | CONSONANT | null | ཪ Fixed-Form Ra |
U+0F6B |
Letter | CONSONANT | null | ཫ Kka |
U+0F6C |
Letter | CONSONANT | null | ཬ Rra |
U+0F6D |
unassigned | |||
U+0F6E |
unassigned | |||
U+0F6F |
unassigned | |||
U+0F70 |
unassigned | |||
U+0F71 |
Mark [Mn] | VOWEL_DEPENDENT | BOTTOM_POSITION | ཱ Sign Aa |
U+0F72 |
Mark [Mn] | VOWEL_DEPENDENT | TOP_POSITION | ི Sign I |
U+0F73 |
Mark [Mn] | VOWEL_DEPENDENT | TOP_AND_BOTTOM_POSITION | ཱི Sign Ii |
U+0F74 |
Mark [Mn] | VOWEL_DEPENDENT | BOTTOM_POSITION | ུ Sign U |
U+0F75 |
Mark [Mn] | VOWEL_DEPENDENT | BOTTOM_POSITION | ཱུ Sign Uu |
U+0F76 |
Mark [Mn] | VOWEL_DEPENDENT | TOP_AND_BOTTOM_POSITION | ྲྀ Sign Vocalic R |
U+0F77 |
Mark [Mn] | VOWEL_DEPENDENT | TOP_AND_BOTTOM_POSITION | ཷ Sign Vocalic Rr |
U+0F78 |
Mark [Mn] | VOWEL_DEPENDENT | TOP_AND_BOTTOM_POSITION | ླྀ Sign Vocalic L |
U+0F79 |
Mark [Mn] | VOWEL_DEPENDENT | TOP_AND_BOTTOM_POSITION | ཹ Sign Vocalic Ll |
U+0F7A |
Mark [Mn] | VOWEL_DEPENDENT | TOP_POSITION | ེ Sign E |
U+0F7B |
Mark [Mn] | VOWEL_DEPENDENT | TOP_POSITION | ཻ Sign Ee |
U+0F7C |
Mark [Mn] | VOWEL_DEPENDENT | TOP_POSITION | ོ Sign O |
U+0F7D |
Mark [Mn] | VOWEL_DEPENDENT | TOP_POSITION | ཽ Sign Oo |
U+0F7E |
Mark [Mn] | BINDU | TOP_POSITION | ཾ Sign Rjes Su Nga Ro |
U+0F7F |
Mark [Mc] | VISARGA | RIGHT_POSITION | ཿ Sign Rnam Bcad |
U+0F80 |
Mark [Mn] | VOWEL_DEPENDENT | TOP_POSITION | ྀ Sign Reversed I |
U+0F81 |
Mark [Mn] | VOWEL_DEPENDENT | TOP_AND_BOTTOM_POSITION | ཱྀ Sign Reversed Ii |
U+0F82 |
Mark [Mn] | BINDU | TOP_POSITION | ྂ Sign Nyi Zla Naa Da |
U+0F83 |
Mark [Mn] | BINDU | TOP_POSITION | ྃ Sign Sna Ldan |
U+0F84 |
Mark [Mn] | VIRAMA | BOTTOM_POSITION | ྄ Halanta |
U+0F85 |
Punctuation | AVAGRAHA | null | ྅ Paluta |
U+0F86 |
Mark [Mn] | TONE_MARKER | TOP_POSITION | ྆ Sign Lci Rtags |
U+0F87 |
Mark [Mn] | TONE_MARKER | TOP_POSITION | ྇ Sign Yang Rtags |
U+0F88 |
Letter | CONSONANT_HEAD | null | ྈ Sign Lce Tsa Can |
U+0F89 |
Letter | CONSONANT_HEAD | null | ྉ Sign Mchu Can |
U+0F8A |
Letter | CONSONANT_HEAD | null | ྊ Sign Gru Can Rgyings |
U+0F8B |
Letter | CONSONANT_HEAD | null | ྋ Sign Gru Med Rgyings |
U+0F8C |
Letter | CONSONANT_HEAD | null | ྌ Sign Inverted Mchu Can |
U+0F8D |
Mark [Mn] | CONSONANT_SUBJOINED | BOTTOM_POSITION | ྍ Subjoined Sign Lce Tsa Can |
U+0F8E |
Mark [Mn] | CONSONANT_SUBJOINED | BOTTOM_POSITION | ྎ Subjoined Sign Mchu Can |
U+0F8F |
Mark [Mn] | CONSONANT_SUBJOINED | BOTTOM_POSITION | ྏ Subjoined Sign Inverted Mchu Can |
U+0F90 |
Mark [Mn] | CONSONANT_SUBJOINED | BOTTOM_POSITION | ྐ Subjoined Ka |
U+0F91 |
Mark [Mn] | CONSONANT_SUBJOINED | BOTTOM_POSITION | ྑ Subjoined Kha |
U+0F92 |
Mark [Mn] | CONSONANT_SUBJOINED | BOTTOM_POSITION | ྒ Subjoined Ga |
U+0F93 |
Mark [Mn] | CONSONANT_SUBJOINED | BOTTOM_POSITION | ྒྷ Subjoined Gha |
U+0F94 |
Mark [Mn] | CONSONANT_SUBJOINED | BOTTOM_POSITION | ྔ Subjoined Nga |
U+0F95 |
Mark [Mn] | CONSONANT_SUBJOINED | BOTTOM_POSITION | ྕ Subjoined Ca |
U+0F96 |
Mark [Mn] | CONSONANT_SUBJOINED | BOTTOM_POSITION | ྖ Subjoined Cha |
U+0F97 |
Mark [Mn] | CONSONANT_SUBJOINED | BOTTOM_POSITION | ྗ Subjoined Ja |
U+0F98 |
unassigned | |||
U+0F99 |
Mark [Mn] | CONSONANT_SUBJOINED | BOTTOM_POSITION | ྙ Subjoined Nya |
U+0F9A |
Mark [Mn] | CONSONANT_SUBJOINED | BOTTOM_POSITION | ྚ Subjoined Tta |
U+0F9B |
Mark [Mn] | CONSONANT_SUBJOINED | BOTTOM_POSITION | ྛ Subjoined Ttha |
U+0F9C |
Mark [Mn] | CONSONANT_SUBJOINED | BOTTOM_POSITION | ྜ Subjoined Dda |
U+0F9D |
Mark [Mn] | CONSONANT_SUBJOINED | BOTTOM_POSITION | ྜྷ Subjoined Ddha |
U+0F9E |
Mark [Mn] | CONSONANT_SUBJOINED | BOTTOM_POSITION | ྞ Subjoined Nna |
U+0F9F |
Mark [Mn] | CONSONANT_SUBJOINED | BOTTOM_POSITION | ྟ Subjoined Ta |
U+0FA0 |
Mark [Mn] | CONSONANT_SUBJOINED | BOTTOM_POSITION | ྠ Subjoined Tha |
U+0FA1 |
Mark [Mn] | CONSONANT_SUBJOINED | BOTTOM_POSITION | ྡ Subjoined Da |
U+0FA2 |
Mark [Mn] | CONSONANT_SUBJOINED | BOTTOM_POSITION | ྡྷ Subjoined Dha |
U+0FA3 |
Mark [Mn] | CONSONANT_SUBJOINED | BOTTOM_POSITION | ྣ Subjoined Na |
U+0FA4 |
Mark [Mn] | CONSONANT_SUBJOINED | BOTTOM_POSITION | ྤ Subjoined Pa |
U+0FA5 |
Mark [Mn] | CONSONANT_SUBJOINED | BOTTOM_POSITION | ྥ Subjoined Pha |
U+0FA6 |
Mark [Mn] | CONSONANT_SUBJOINED | BOTTOM_POSITION | ྦ Subjoined Ba |
U+0FA7 |
Mark [Mn] | CONSONANT_SUBJOINED | BOTTOM_POSITION | ྦྷ Subjoined Bha |
U+0FA8 |
Mark [Mn] | CONSONANT_SUBJOINED | BOTTOM_POSITION | ྨ Subjoined Ma |
U+0FA9 |
Mark [Mn] | CONSONANT_SUBJOINED | BOTTOM_POSITION | ྩ Subjoined Tsa |
U+0FAA |
Mark [Mn] | CONSONANT_SUBJOINED | BOTTOM_POSITION | ྪ Subjoined Tsha |
U+0FAB |
Mark [Mn] | CONSONANT_SUBJOINED | BOTTOM_POSITION | ྫ Subjoined Dza |
U+0FAC |
Mark [Mn] | CONSONANT_SUBJOINED | BOTTOM_POSITION | ྫྷ Subjoined Dzha |
U+0FAD |
Mark [Mn] | CONSONANT_SUBJOINED | BOTTOM_POSITION | ྭ Subjoined Wa |
U+0FAE |
Mark [Mn] | CONSONANT_SUBJOINED | BOTTOM_POSITION | ྮ Subjoined Zha |
U+0FAF |
Mark [Mn] | CONSONANT_SUBJOINED | BOTTOM_POSITION | ྯ Subjoined Za |
U+0FB0 |
Mark [Mn] | CONSONANT_SUBJOINED | BOTTOM_POSITION | ྰ Subjoined -A |
U+0FB1 |
Mark [Mn] | CONSONANT_SUBJOINED | BOTTOM_POSITION | ྱ Subjoined Ya |
U+0FB2 |
Mark [Mn] | CONSONANT_SUBJOINED | BOTTOM_POSITION | ྲ Subjoined Ra |
U+0FB3 |
Mark [Mn] | CONSONANT_SUBJOINED | BOTTOM_POSITION | ླ Subjoined La |
U+0FB4 |
Mark [Mn] | CONSONANT_SUBJOINED | BOTTOM_POSITION | ྴ Subjoined Sha |
U+0FB5 |
Mark [Mn] | CONSONANT_SUBJOINED | BOTTOM_POSITION | ྵ Subjoined Ssa |
U+0FB6 |
Mark [Mn] | CONSONANT_SUBJOINED | BOTTOM_POSITION | ྶ Subjoined Sa |
U+0FB7 |
Mark [Mn] | CONSONANT_SUBJOINED | BOTTOM_POSITION | ྷ Subjoined Ha |
U+0FB8 |
Mark [Mn] | CONSONANT_SUBJOINED | BOTTOM_POSITION | ྸ Subjoined A |
U+0FB9 |
Mark [Mn] | CONSONANT_SUBJOINED | BOTTOM_POSITION | ྐྵ Subjoined Kssa |
U+0FBA |
Mark [Mn] | CONSONANT_SUBJOINED | BOTTOM_POSITION | ྺ Subjoined Fixed-Form Wa |
U+0FBB |
Mark [Mn] | CONSONANT_SUBJOINED | BOTTOM_POSITION | ྻ Subjoined Fixed-Form Ya |
U+0FBC |
Mark [Mn] | CONSONANT_SUBJOINED | BOTTOM_POSITION | ྼ Subjoined Fixed-Form Ra |
U+0FBD |
unassigned | |||
U+0FBE |
Symbol | SYMBOL | null | ྾ Ku Ru Kha |
U+0FBF |
Symbol | SYMBOL | null | ྿ Ku Ru Kha Bzhi Mig Can |
U+0FC0 |
Symbol | SYMBOL | null | ࿀ Cantillation Sign Heavy Beat |
U+0FC1 |
Symbol | SYMBOL | null | ࿁ Cantillation Sign Light Beat |
U+0FC2 |
Symbol | SYMBOL | null | ࿂ Cantillation Sign Cang Te-U |
U+0FC3 |
Symbol | SYMBOL | null | ࿃ Cantillation Sign Sbub -Chal |
U+0FC4 |
Symbol | SYMBOL | null | ࿄ Symbol Dril Bu |
U+0FC5 |
Symbol | SYMBOL | null | ࿅ Symbol Rdo Rje |
U+0FC6 |
Mark [Mn] | SYLLABLE_MODIFIER | BOTTOM_POSITION | ࿆ Symbol Padma Gdan |
U+0FC7 |
Symbol | SYMBOL | null | ࿇ Symbol Rdo Rje Rgya Gram |
U+0FC8 |
Symbol | SYMBOL | null | ࿈ Symbol Phur Pa |
U+0FC9 |
Symbol | SYMBOL | null | ࿉ Symbol Nor Bu |
U+0FCA |
Symbol | SYMBOL | null | ࿊ Symbol Nor Bu Nyis -Khyil |
U+0FCB |
Symbol | SYMBOL | null | ࿋ Symbol Nor Bu Gsum -Khyil |
U+0FCC |
Symbol | SYMBOL | null | ࿌ Symbol Nor Bu Bzhi -Khyil |
U+0FCD |
unassigned | |||
U+0FCE |
Symbol | SYMBOL | null | ࿎ Sign Rdel Nag Rdel Dkar |
U+0FCF |
Symbol | SYMBOL | null | ࿏ Sign Rdel Nag Gsum |
U+0FD0 |
Punctuation | null | null | ࿐ Bska- Shog Gi Mgo Rgyan |
U+0FD1 |
Punctuation | null | null | ࿑ Mnyam Yig Gi Mgo Rgyan |
U+0FD2 |
Punctuation | null | null | ࿒ Nyis Tsheg |
U+0FD3 |
Punctuation | null | null | ࿓ Initial Brda Rnying Yig Mgo Mdun |
U+0FD4 |
Punctuation | null | null | ࿔ Closing Brda Rnying Yig Mgo Sgab |
U+0FD5 |
Symbol | SYMBOL | null | ࿕ Right-Facing Svasti Sign |
U+0FD6 |
Symbol | SYMBOL | null | ࿖ Left-Facing Svasti Sign |
U+0FD7 |
Symbol | SYMBOL | null | ࿗ Right-Facing Svasti Sign With Dots |
U+0FD8 |
Symbol | SYMBOL | null | ࿘ Left-Facing Svasti Sign With Dots |
U+0FD9 |
Punctuation | null | null | ࿙ Leading Mchan Rtags |
U+0FDA |
Punctuation | null | null | ࿚ Trailing Mchan Rtags |
U+0FDB |
unassigned | |||
U+0FDC |
unassigned | |||
U+0FDD |
unassigned | |||
U+0FDE |
unassigned | |||
U+0FDF |
unassigned | |||
Other important characters that may be encountered when shaping runs
of Tibetan text include the dotted-circle placeholder (U+25CC
), the
zero-width joiner (U+200D
) and zero-width non-joiner (U+200C
), and
the no-break space (U+00A0
).
The dotted-circle placeholder is frequently used when displaying a dependent vowel (matra) or a combining mark in isolation. Real-world text syllables may also use other characters, such as hyphens or dashes, in a similar placeholder fashion; shaping engines should cope with this situation gracefully.
Codepoint | Unicode category | Shaping class | Mark-placement subclass | Glyph |
---|---|---|---|---|
U+00A0 |
Separator | PLACEHOLDER | null | No-break space |
U+200C |
Other | NON_JOINER | null | Zero-width non-joiner |
U+200D |
Other | JOINER | null | Zero-width joiner |
U+25CC |
Symbol | DOTTED_CIRCLE | null | ◌ Dotted circle |
U+2638 |
Symbol | SYMBOL | null | ☸ Wheel of Dharma |
The zero-width joiner (ZWJ) is primarily used to prevent the formation of a conjunct from a "consonant,Halant,consonant" sequence. The sequence "consonant,Halant,ZWJ,consonant" blocks the formation of a conjunct between the two consonants.
Note, however, that the "consonant,Halant" subsequence in the above example may still trigger a half-forms feature. To prevent the application of the half-forms feature in addition to preventing the conjunct, the zero-width non-joiner (ZWNJ) must be used instead. The sequence "consonant,Halant,ZWNJ,consonant" should produce the first consonant in its standard form, followed by an explicit "Halant".
A secondary usage of the zero-width joiner is to prevent the formation of "Reph". An initial "Ra,Halant,ZWJ" sequence should not produce a "Reph", where an initial "Ra,Halant" sequence without the zero-width joiner otherwise would.
The no-break space (NBSP) is primarily used to display those codepoints that are defined as non-spacing (marks, dependent vowels (matras), below-base consonant forms, and post-base consonant forms) in an isolated context, as an alternative to displaying them superimposed on the dotted-circle placeholder. These sequences will match "NBSP,ZWJ,Halant,consonant", "NBSP,mark", or "NBSP,matra".