This document lists the per-character shaping information needed to shape Lao text.
Table of Contents
Lao glyphs should be classified as in the following table. Codepoints in the Lao block with no assigned meaning are designated as unassigned in the Unicode category column.
Assigned codepoints with a null in the Shaping class column evoke no special behavior from the shaping engine. Note that this does include some valid codepoints, such as currency marks, punctuation, and other symbols.
Note: the
NUMBER
andSYMBOL
Shaping classes are important during syllable identification, but generally evoke no further special behavior during the rest of the shaping process.
The Mark-placement subclass column indicates mark-placement positioning for codepoints in the Mark category. Assigned, non-mark codepoints have a null in this column and evoke no special mark-placement behavior. Marks tagged with [Mn] in the Unicode category column are categorized as non-spacing; marks tagged with [Mc] are categorized as spacing-combining.
Some codepoints in the following table use a Shaping class that differs from the codepoint's Unicode General Category. The Shaping class takes precedence during OpenType shaping, as it captures more specific, script-aware behavior.
Codepoint | Unicode category | Shaping class | Mark-placement subclass | Combining class | PUA | Glyph |
---|---|---|---|---|---|---|
U+0E80 |
unassigned | |||||
U+0E81 |
Letter | CONSONANT | null | 0 | null | ກ Ko |
U+0E82 |
Letter | CONSONANT | null | 0 | null | ຂ Kho Sung |
U+0E83 |
unassigned | |||||
U+0E84 |
Letter | CONSONANT | null | 0 | null | ຄ Kho Tam |
U+0E85 |
unassigned | |||||
U+0E86 |
Letter | CONSONANT | null | 0 | null | ຆ Pali Gha |
U+0E87 |
Letter | CONSONANT | null | 0 | null | ງ Ngo |
U+0E88 |
Letter | CONSONANT | null | 0 | null | ຈ Co |
U+0E89 |
Letter | CONSONANT | null | 0 | null | ຉ Pali Cha |
U+0E8A |
Letter | CONSONANT | null | 0 | null | ຊ So Tam |
U+0E8B |
unassigned | |||||
U+0E8C |
Letter | CONSONANT | null | 0 | null | ຌ Pali Jha |
U+0E8D |
Letter | CONSONANT | null | 0 | null | ຍ Nyo |
U+0E8E |
Letter | CONSONANT | null | 0 | null | ຎ Pali Nya |
U+0E8F |
Letter | CONSONANT | null | 0 | null | ຏ Pali Tta |
U+0E90 |
Letter | CONSONANT | null | 0 | null | ຐ Pali Ttha |
U+0E91 |
Letter | CONSONANT | null | 0 | null | ຑ Pali Dda |
U+0E92 |
Letter | CONSONANT | null | 0 | null | ຒ Pali Ddha |
U+0E93 |
Letter | CONSONANT | null | 0 | null | ຓ Pali Nna |
U+0E94 |
Letter | CONSONANT | null | 0 | null | ດ Do |
U+0E95 |
Letter | CONSONANT | null | 0 | null | ຕ To |
U+0E96 |
Letter | CONSONANT | null | 0 | null | ຖ Tho Sung |
U+0E97 |
Letter | CONSONANT | null | 0 | null | ທ Tho Tam |
U+0E98 |
Letter | CONSONANT | null | 0 | null | ຘ Pali Dha |
U+0E99 |
Letter | CONSONANT | null | 0 | null | ນ No |
U+0E9A |
Letter | CONSONANT | null | 0 | null | ບ Bo |
U+0E9B |
Letter | CONSONANT | null | 0 | null | ປ Po |
U+0E9C |
Letter | CONSONANT | null | 0 | null | ຜ Pho Sung |
U+0E9D |
Letter | CONSONANT | null | 0 | null | ຝ Fo Tam |
U+0E9E |
Letter | CONSONANT | null | 0 | null | ພ Pho Tam |
U+0E9F |
Letter | CONSONANT | null | 0 | null | ຟ Fo Sung |
U+0EA0 |
Letter | CONSONANT | null | 0 | null | ຠ Pali Bha |
U+0EA1 |
Letter | CONSONANT | null | 0 | null | ມ Mo |
U+0EA2 |
Letter | CONSONANT | null | 0 | null | ຢ Yo |
U+0EA3 |
Letter | CONSONANT | null | 0 | null | ຣ Lo Ling |
U+0EA4 |
unassigned | |||||
U+0EA5 |
Letter | CONSONANT | null | 0 | null | ລ Lo Loot |
U+0EA6 |
unassigned | |||||
U+0EA7 |
Letter | CONSONANT | null | 0 | null | ວ Wo |
U+0EA8 |
Letter | CONSONANT | null | 0 | null | ຨ Sanskrit Sha |
U+0EA9 |
Letter | CONSONANT | null | 0 | null | ຩ Sanskrit Ssa |
U+0EAA |
Letter | CONSONANT | null | 0 | null | ສ So Sung |
U+0EAB |
Letter | CONSONANT | null | 0 | null | ຫ Ho Sung |
U+0EAC |
Letter | CONSONANT | null | 0 | null | ຬ Pali Lla |
U+0EAD |
Letter | CONSONANT | null | 0 | null | ອ O |
U+0EAE |
Letter | CONSONANT | null | 0 | null | ຮ Ho Tam |
U+0EAF |
Letter | null | null | 0 | null | ຯ Ellipsis |
U+0EB0 |
Letter | VOWEL_DEPENDENT | RIGHT_POSITION | 0 | null | ະ Sign A |
U+0EB1 |
Mark [Mn] | VOWEL_DEPENDENT | TOP_POSITION | 0 | null | ັ Sign Mai Kan |
U+0EB2 |
Letter | VOWEL_DEPENDENT | RIGHT_POSITION | 0 | null | າ Sign Aa |
U+0EB3 |
Letter | VOWEL_DEPENDENT | RIGHT_POSITION | 0 | null | ຳ Sign Am |
U+0EB4 |
Mark [Mn] | VOWEL_DEPENDENT | TOP_POSITION | 0 | null | ິ Sign I |
U+0EB5 |
Mark [Mn] | VOWEL_DEPENDENT | TOP_POSITION | 0 | null | ີ Sign Ii |
U+0EB6 |
Mark [Mn] | VOWEL_DEPENDENT | TOP_POSITION | 0 | null | ຶ Sign Y |
U+0EB7 |
Mark [Mn] | VOWEL_DEPENDENT | TOP_POSITION | 0 | null | ື Sign Yy |
U+0EB8 |
Mark [Mn] | VOWEL_DEPENDENT | BOTTOM_POSITION | 118 | null | ຸ Sign U |
U+0EB9 |
Mark [Mn] | VOWEL_DEPENDENT | BOTTOM_POSITION | 118 | null | ູ Sign Uu |
U+0EBA |
Mark [Mn] | VIRAMA | BOTTOM_POSITION | 9 | null | ຺ Pali Virama |
U+0EBB |
Mark [Mn] | VOWEL_DEPENDENT | TOP_POSITION | 0 | null | ົ Sign Mai Kon |
U+0EBC |
Mark [Mn] | CONSONANT_MEDIAL | BOTTOM_POSITION | 0 | null | ຼ Semivowel Sign Lo |
U+0EBD |
Letter | CONSONANT_MEDIAL | null | 0 | null | ຽ Semivowel Sign Nyo |
U+0EBE |
unassigned | |||||
U+0EBF |
unassigned | |||||
U+0EC0 |
Letter | VOWEL_DEPENDENT | VISUAL_ORDER_LEFT | 0 | null | ເ Sign E |
U+0EC1 |
Letter | VOWEL_DEPENDENT | VISUAL_ORDER_LEFT | 0 | null | ແ Sign Ei |
U+0EC2 |
Letter | VOWEL_DEPENDENT | VISUAL_ORDER_LEFT | 0 | null | ໂ Sign O |
U+0EC3 |
Letter | VOWEL_DEPENDENT | VISUAL_ORDER_LEFT | 0 | null | ໃ Sign Ay |
U+0EC4 |
Letter | VOWEL_DEPENDENT | VISUAL_ORDER_LEFT | 0 | null | ໄ Sign Ai |
U+0EC5 |
unassigned | |||||
U+0EC6 |
Letter Modifier | null | null | 0 | null | ໆ Ko La |
U+0EC7 |
unassigned | |||||
U+0EC8 |
Mark [Mn] | TONE_MARKER | TOP_POSITION | 122 | null | ່ Tone Mai Ek |
U+0EC9 |
Mark [Mn] | TONE_MARKER | TOP_POSITION | 122 | null | ້ Tone Mai Tho |
U+0ECA |
Mark [Mn] | TONE_MARKER | TOP_POSITION | 122 | null | ໊ Tone Mai Ti |
U+0ECB |
Mark [Mn] | TONE_MARKER | TOP_POSITION | 122 | null | ໋ Tone Mai Catawa |
U+0ECC |
Mark [Mn] | null | TOP_POSITION | 0 | null | ໌ Cancellation mark |
U+0ECD |
Mark [Mn] | BINDU | TOP_POSITION | 0 | null | ໍ Niggahita |
U+0ECE |
Mark [Mn] | TONE_MARKER | TOP_POSITION | 0 | null | ໎ Yamakkan |
U+0ECF |
unassigned | |||||
U+0ED0 |
Number | NUMBER | null | 0 | null | ໐ Digit Zero |
U+0ED1 |
Number | NUMBER | null | 0 | null | ໑ Digit One |
U+0ED2 |
Number | NUMBER | null | 0 | null | ໒ Digit Two |
U+0ED3 |
Number | NUMBER | null | 0 | null | ໓ Digit Three |
U+0ED4 |
Number | NUMBER | null | 0 | null | ໔ Digit Four |
U+0ED5 |
Number | NUMBER | null | 0 | null | ໕ Digit Five |
U+0ED6 |
Number | NUMBER | null | 0 | null | ໖ Digit Six |
U+0ED7 |
Number | NUMBER | null | 0 | null | ໗ Digit Seven |
U+0ED8 |
Number | NUMBER | null | 0 | null | ໘ Digit Eight |
U+0ED9 |
Number | NUMBER | null | 0 | null | ໙ Digit Nine |
U+0EDA |
unassigned | |||||
U+0EDB |
unassigned | |||||
U+0EDC |
Letter | CONSONANT | null | 0 | null | ໜ Ho No |
U+0EDD |
Letter | CONSONANT | null | 0 | null | ໝ Ho Mo |
U+0EDE |
Letter | CONSONANT | null | 0 | null | ໞ Khmu Go |
U+0EDF |
Letter | CONSONANT | null | 0 | null | ໟ Khmu Nyo |
U+0EE0 |
unassigned | |||||
U+0EE1 |
unassigned | |||||
U+0EE2 |
unassigned | |||||
U+0EE3 |
unassigned | |||||
U+0EE4 |
unassigned | |||||
U+0EE5 |
unassigned | |||||
U+0EE6 |
unassigned | |||||
U+0EE7 |
unassigned | |||||
U+0EE8 |
unassigned | |||||
U+0EE9 |
unassigned | |||||
U+0EEA |
unassigned | |||||
U+0EEB |
unassigned | |||||
U+0EEC |
unassigned | |||||
U+0EED |
unassigned | |||||
U+0EEE |
unassigned | |||||
U+0EEF |
unassigned | |||||
U+0EF0 |
unassigned | |||||
U+0EF1 |
unassigned | |||||
U+0EF2 |
unassigned | |||||
U+0EF3 |
unassigned | |||||
U+0EF4 |
unassigned | |||||
U+0EF5 |
unassigned | |||||
U+0EF6 |
unassigned | |||||
U+0EF7 |
unassigned | |||||
U+0EF8 |
unassigned | |||||
U+0EF9 |
unassigned | |||||
U+0EFA |
unassigned | |||||
U+0EFB |
unassigned | |||||
U+0EFC |
unassigned | |||||
U+0EFD |
unassigned | |||||
U+0EFE |
unassigned | |||||
U+0EFF |
unassigned |
In addition to general punctuation, runs of Lao text text typically do not
insert spaces between words. Consequently, the Zero-Width Space (U+200B
)
character is often used to insert invisible break points that may be
converted to line breaks.
Codepoint | Unicode category | Shaping class | Mark-placement subclass | Glyph |
---|---|---|---|---|
U+200B |
Separator | PLACEHOLDER | null | Zero-width space |
Other important characters that may be encountered when shaping runs
of Lao text include the dotted-circle placeholder (U+25CC
), the
zero-width joiner (U+200D
) and zero-width non-joiner (U+200C
), and
the no-break space (U+00A0
).
The dotted-circle placeholder is frequently used when displaying a dependent vowel or a combining mark in isolation. Real-world text syllables may also use other characters, such as hyphens or dashes, in a similar placeholder fashion; shaping engines should cope with this situation gracefully.
Codepoint | Unicode category | Shaping class | Mark-placement subclass | Glyph |
---|---|---|---|---|
U+00A0 |
Separator | PLACEHOLDER | null | No-break space |
U+200C |
Other | NON_JOINER | null | Zero-width non-joiner |
U+200D |
Other | JOINER | null | Zero-width joiner |
U+2010 |
Punctuation | PLACEHOLDER | null | ‐ Hyphen |
U+2011 |
Punctuation | PLACEHOLDER | null | ‑ No-break hyphen |
U+2012 |
Punctuation | PLACEHOLDER | null | ‒ Figure dash |
U+2013 |
Punctuation | PLACEHOLDER | null | – En dash |
U+2014 |
Punctuation | PLACEHOLDER | null | — Em dash |
U+25CC |
Symbol | DOTTED_CIRCLE | null | ◌ Dotted circle |