Skip to content

Latest commit

 

History

History
212 lines (192 loc) · 21.6 KB

character-tables-lao.md

File metadata and controls

212 lines (192 loc) · 21.6 KB

Lao character tables

This document lists the per-character shaping information needed to shape Lao text.

Table of Contents

Lao character table

Lao glyphs should be classified as in the following table. Codepoints in the Lao block with no assigned meaning are designated as unassigned in the Unicode category column.

Assigned codepoints with a null in the Shaping class column evoke no special behavior from the shaping engine. Note that this does include some valid codepoints, such as currency marks, punctuation, and other symbols.

Note: the NUMBER and SYMBOL Shaping classes are important during syllable identification, but generally evoke no further special behavior during the rest of the shaping process.

The Mark-placement subclass column indicates mark-placement positioning for codepoints in the Mark category. Assigned, non-mark codepoints have a null in this column and evoke no special mark-placement behavior. Marks tagged with [Mn] in the Unicode category column are categorized as non-spacing; marks tagged with [Mc] are categorized as spacing-combining.

Some codepoints in the following table use a Shaping class that differs from the codepoint's Unicode General Category. The Shaping class takes precedence during OpenType shaping, as it captures more specific, script-aware behavior.

Codepoint Unicode category Shaping class Mark-placement subclass Combining class PUA Glyph
U+0E80 unassigned
U+0E81 Letter CONSONANT null 0 null ກ Ko
U+0E82 Letter CONSONANT null 0 null ຂ Kho Sung
U+0E83 unassigned
U+0E84 Letter CONSONANT null 0 null ຄ Kho Tam
U+0E85 unassigned
U+0E86 Letter CONSONANT null 0 null ຆ Pali Gha
U+0E87 Letter CONSONANT null 0 null ງ Ngo
U+0E88 Letter CONSONANT null 0 null ຈ Co
U+0E89 Letter CONSONANT null 0 null ຉ Pali Cha
U+0E8A Letter CONSONANT null 0 null ຊ So Tam
U+0E8B unassigned
U+0E8C Letter CONSONANT null 0 null ຌ Pali Jha
U+0E8D Letter CONSONANT null 0 null ຍ Nyo
U+0E8E Letter CONSONANT null 0 null ຎ Pali Nya
U+0E8F Letter CONSONANT null 0 null ຏ Pali Tta
U+0E90 Letter CONSONANT null 0 null ຐ Pali Ttha
U+0E91 Letter CONSONANT null 0 null ຑ Pali Dda
U+0E92 Letter CONSONANT null 0 null ຒ Pali Ddha
U+0E93 Letter CONSONANT null 0 null ຓ Pali Nna
U+0E94 Letter CONSONANT null 0 null ດ Do
U+0E95 Letter CONSONANT null 0 null ຕ To
U+0E96 Letter CONSONANT null 0 null ຖ Tho Sung
U+0E97 Letter CONSONANT null 0 null ທ Tho Tam
U+0E98 Letter CONSONANT null 0 null ຘ Pali Dha
U+0E99 Letter CONSONANT null 0 null ນ No
U+0E9A Letter CONSONANT null 0 null ບ Bo
U+0E9B Letter CONSONANT null 0 null ປ Po
U+0E9C Letter CONSONANT null 0 null ຜ Pho Sung
U+0E9D Letter CONSONANT null 0 null ຝ Fo Tam
U+0E9E Letter CONSONANT null 0 null ພ Pho Tam
U+0E9F Letter CONSONANT null 0 null ຟ Fo Sung
U+0EA0 Letter CONSONANT null 0 null ຠ Pali Bha
U+0EA1 Letter CONSONANT null 0 null ມ Mo
U+0EA2 Letter CONSONANT null 0 null ຢ Yo
U+0EA3 Letter CONSONANT null 0 null ຣ Lo Ling
U+0EA4 unassigned
U+0EA5 Letter CONSONANT null 0 null ລ Lo Loot
U+0EA6 unassigned
U+0EA7 Letter CONSONANT null 0 null ວ Wo
U+0EA8 Letter CONSONANT null 0 null ຨ Sanskrit Sha
U+0EA9 Letter CONSONANT null 0 null ຩ Sanskrit Ssa
U+0EAA Letter CONSONANT null 0 null ສ So Sung
U+0EAB Letter CONSONANT null 0 null ຫ Ho Sung
U+0EAC Letter CONSONANT null 0 null ຬ Pali Lla
U+0EAD Letter CONSONANT null 0 null ອ O
U+0EAE Letter CONSONANT null 0 null ຮ Ho Tam
U+0EAF Letter null null 0 null ຯ Ellipsis
U+0EB0 Letter VOWEL_DEPENDENT RIGHT_POSITION 0 null ະ Sign A
U+0EB1 Mark [Mn] VOWEL_DEPENDENT TOP_POSITION 0 null ັ Sign Mai Kan
U+0EB2 Letter VOWEL_DEPENDENT RIGHT_POSITION 0 null າ Sign Aa
U+0EB3 Letter VOWEL_DEPENDENT RIGHT_POSITION 0 null ຳ Sign Am
U+0EB4 Mark [Mn] VOWEL_DEPENDENT TOP_POSITION 0 null ິ Sign I
U+0EB5 Mark [Mn] VOWEL_DEPENDENT TOP_POSITION 0 null ີ Sign Ii
U+0EB6 Mark [Mn] VOWEL_DEPENDENT TOP_POSITION 0 null ຶ Sign Y
U+0EB7 Mark [Mn] VOWEL_DEPENDENT TOP_POSITION 0 null ື Sign Yy
U+0EB8 Mark [Mn] VOWEL_DEPENDENT BOTTOM_POSITION 118 null ຸ Sign U
U+0EB9 Mark [Mn] VOWEL_DEPENDENT BOTTOM_POSITION 118 null ູ Sign Uu
U+0EBA Mark [Mn] VIRAMA BOTTOM_POSITION 9 null ຺ Pali Virama
U+0EBB Mark [Mn] VOWEL_DEPENDENT TOP_POSITION 0 null ົ Sign Mai Kon
U+0EBC Mark [Mn] CONSONANT_MEDIAL BOTTOM_POSITION 0 null ຼ Semivowel Sign Lo
U+0EBD Letter CONSONANT_MEDIAL null 0 null ຽ Semivowel Sign Nyo
U+0EBE unassigned
U+0EBF unassigned
U+0EC0 Letter VOWEL_DEPENDENT VISUAL_ORDER_LEFT 0 null ເ Sign E
U+0EC1 Letter VOWEL_DEPENDENT VISUAL_ORDER_LEFT 0 null ແ Sign Ei
U+0EC2 Letter VOWEL_DEPENDENT VISUAL_ORDER_LEFT 0 null ໂ Sign O
U+0EC3 Letter VOWEL_DEPENDENT VISUAL_ORDER_LEFT 0 null ໃ Sign Ay
U+0EC4 Letter VOWEL_DEPENDENT VISUAL_ORDER_LEFT 0 null ໄ Sign Ai
U+0EC5 unassigned
U+0EC6 Letter Modifier null null 0 null ໆ Ko La
U+0EC7 unassigned
U+0EC8 Mark [Mn] TONE_MARKER TOP_POSITION 122 null ່ Tone Mai Ek
U+0EC9 Mark [Mn] TONE_MARKER TOP_POSITION 122 null ້ Tone Mai Tho
U+0ECA Mark [Mn] TONE_MARKER TOP_POSITION 122 null ໊ Tone Mai Ti
U+0ECB Mark [Mn] TONE_MARKER TOP_POSITION 122 null ໋ Tone Mai Catawa
U+0ECC Mark [Mn] null TOP_POSITION 0 null ໌ Cancellation mark
U+0ECD Mark [Mn] BINDU TOP_POSITION 0 null ໍ Niggahita
U+0ECE Mark [Mn] TONE_MARKER TOP_POSITION 0 null ໎ Yamakkan
U+0ECF unassigned
U+0ED0 Number NUMBER null 0 null ໐ Digit Zero
U+0ED1 Number NUMBER null 0 null ໑ Digit One
U+0ED2 Number NUMBER null 0 null ໒ Digit Two
U+0ED3 Number NUMBER null 0 null ໓ Digit Three
U+0ED4 Number NUMBER null 0 null ໔ Digit Four
U+0ED5 Number NUMBER null 0 null ໕ Digit Five
U+0ED6 Number NUMBER null 0 null ໖ Digit Six
U+0ED7 Number NUMBER null 0 null ໗ Digit Seven
U+0ED8 Number NUMBER null 0 null ໘ Digit Eight
U+0ED9 Number NUMBER null 0 null ໙ Digit Nine
U+0EDA unassigned
U+0EDB unassigned
U+0EDC Letter CONSONANT null 0 null ໜ Ho No
U+0EDD Letter CONSONANT null 0 null ໝ Ho Mo
U+0EDE Letter CONSONANT null 0 null ໞ Khmu Go
U+0EDF Letter CONSONANT null 0 null ໟ Khmu Nyo
U+0EE0 unassigned
U+0EE1 unassigned
U+0EE2 unassigned
U+0EE3 unassigned
U+0EE4 unassigned
U+0EE5 unassigned
U+0EE6 unassigned
U+0EE7 unassigned
U+0EE8 unassigned
U+0EE9 unassigned
U+0EEA unassigned
U+0EEB unassigned
U+0EEC unassigned
U+0EED unassigned
U+0EEE unassigned
U+0EEF unassigned
U+0EF0 unassigned
U+0EF1 unassigned
U+0EF2 unassigned
U+0EF3 unassigned
U+0EF4 unassigned
U+0EF5 unassigned
U+0EF6 unassigned
U+0EF7 unassigned
U+0EF8 unassigned
U+0EF9 unassigned
U+0EFA unassigned
U+0EFB unassigned
U+0EFC unassigned
U+0EFD unassigned
U+0EFE unassigned
U+0EFF unassigned

Miscellaneous character table

In addition to general punctuation, runs of Lao text text typically do not insert spaces between words. Consequently, the Zero-Width Space (U+200B) character is often used to insert invisible break points that may be converted to line breaks.

Codepoint Unicode category Shaping class Mark-placement subclass Glyph
U+200B Separator PLACEHOLDER null ​ Zero-width space

Other important characters that may be encountered when shaping runs of Lao text include the dotted-circle placeholder (U+25CC), the zero-width joiner (U+200D) and zero-width non-joiner (U+200C), and the no-break space (U+00A0).

The dotted-circle placeholder is frequently used when displaying a dependent vowel or a combining mark in isolation. Real-world text syllables may also use other characters, such as hyphens or dashes, in a similar placeholder fashion; shaping engines should cope with this situation gracefully.

Codepoint Unicode category Shaping class Mark-placement subclass Glyph
U+00A0 Separator PLACEHOLDER null   No-break space
U+200C Other NON_JOINER null ‌ Zero-width non-joiner
U+200D Other JOINER null ‍ Zero-width joiner
U+2010 Punctuation PLACEHOLDER null ‐ Hyphen
U+2011 Punctuation PLACEHOLDER null ‑ No-break hyphen
U+2012 Punctuation PLACEHOLDER null ‒ Figure dash
U+2013 Punctuation PLACEHOLDER null – En dash
U+2014 Punctuation PLACEHOLDER null — Em dash
U+25CC Symbol DOTTED_CIRCLE null ◌ Dotted circle