Skip to content

Commit

Permalink
Myanmar: simplify reordering logic.
Browse files Browse the repository at this point in the history
  • Loading branch information
n8willis committed Aug 4, 2024
1 parent e9a2ad3 commit aee746e
Showing 1 changed file with 43 additions and 60 deletions.
103 changes: 43 additions & 60 deletions opentype-shaping-myanmar.md
Original file line number Diff line number Diff line change
Expand Up @@ -340,10 +340,10 @@ Processing a run of `<mym2>` text involves five top-level stages:


As with other Brahmi-derived and Indic scripts, the initial reordering
stage and the final reordering stage each involve applying a set of several
script-specific rules. The basic substitution features must be applied
to the run in a specific order. The remaining substitution features in
stage four, however, do not have a mandatory order.
stage involves applying a set of several script-specific rules. The
basic substitution features must be applied to the run in a specific
order. The remaining substitution features in stage four, however, do
not have a mandatory order.


Myanmar exhibits many of the same shaping patterns found in Indic
Expand Down Expand Up @@ -738,37 +738,28 @@ The algorithm for determining the base consonant is
next consonant.
- The consonant stopped at will be the base consonant.

> Note: The algorithm considers only `CONSONANT` class consonants,
<!--- > Note: The algorithm considers only `CONSONANT` class consonants, --->


#### Stage 2, step 2: Tag matras ####
#### Stage 2, step 2: Kinzi ####

Second, all left-side dependent-vowel (matra) signs must be tagged to be
moved to the beginning of the syllable, with `POS_PREBASE_MATRA`.

All right-side and above-base dependent-vowel (matra)
signs are tagged `POS_AFTER_SUBJOINED`.

All below-base dependent-vowel (matra) signs are tagged
`POS_BELOWBASE_CONSONANT`.
Second, initial <samp>"Kinzi"</samp>-triggering sequences that will become <samp>"Kinzi"</samp>s
must be tagged with `POS_AFTER_MAIN`.

For simplicity, shaping engines may choose to tag matras
in an earlier text-processing step, using the information in the
_Mark-placement subclass_ column of the character tables. It is
critical at this step, however, that all matras correctly tagged
before proceeding to the next step.
The sequences are:

#### Stage 2, step 3: Anusvara ####
- <samp>"Ra,Asat,Halant"</samp>
- <samp>"Nga,Asat,Halant"</samp>
- <samp>"Mon Nga,Asat,Halant"</samp>

Third, any `ANUSVARA` marks appearing immediately after a below-base
vowel sign must be tagged with `POS_BEFORE_SUBJOINED`, so that the
marks are reordered to a position immediately before the below-base
vowel signs.
In the Myanmar (or Burmese) language, <samp>"Nga"</samp> is the only <samp>"Kinzi"</samp>-forming
consonant. <samp>"Mon Nga"</samp> can form a <samp>"Kinzi"</samp> in the Mon language, and <samp>"Ra"</samp>
can form a <samp>"Kinzi"</samp> in Sanskrit written with the Myanmar script.


#### Stage 2, step 4: Pre-base-reordering consonants ####
#### Stage 2, step 3: Pre-base-reordering consonants ####

Fourth, all pre-base-reordering consonants must be tagged with
Third, all pre-base-reordering consonants must be tagged with
`POS_PREBASE_CONSONANT`.

Myanmar has one pre-base-reordering consonant: <samp>"Medial Ra"</samp>.
Expand All @@ -780,54 +771,46 @@ Pre-base-reordering Medial Ra
:::


#### Stage 2, step 5: Kinzi ####
#### Stage 2, step 4: Pre-base matras ####

Fifth, initial <samp>"Kinzi"</samp>-triggering sequences that will become <samp>"Kinzi"</samp>s
must be tagged with `POS_AFTER_MAIN`.

The sequences are:

- <samp>"Ra,Asat,Halant"</samp>
- <samp>"Nga,Asat,Halant"</samp>
- <samp>"Mon Nga,Asat,Halant"</samp>
Fourth, all left-side dependent-vowel (matra) signs must be tagged to be
moved to the beginning of the syllable, with `POS_PREBASE_MATRA`.

In the Myanmar (or Burmese) language, <samp>"Nga"</samp> is the only <samp>"Kinzi"</samp>-forming
consonant. <samp>"Mon Nga"</samp> can form a <samp>"Kinzi"</samp> in the Mon language, and <samp>"Ra"</samp>
can form a <samp>"Kinzi"</samp> in Sanskrit written with the Myanmar script.
For simplicity, shaping engines may choose to tag matras
in an earlier text-processing step, using the information in the
_Mark-placement subclass_ column of the character tables. It is
critical at this step, however, that all matras correctly tagged
before proceeding to the next step.


#### Stage 2, step 6: Post-base consonants ####
#### Stage 2, step 5: Syllables without below-base matras ####

Sixth, any remaining non-base consonants that occur after the base
consonant must be tagged with `POS_AFTER_MAIN`. Full consonants (of
class `CONSONANT`) will be preceded by a <samp>"Halant"</samp> glyph. Medial
consonants (of class `CONSONANT_MEDIAL`) will not be preceded by a
<samp>"Halant"</samp> glyph.
Fifth, if the syllable contains no below-base dependent-vowel (matra)
signs, then all of the remaining codepoints can be tagged with
`POS_AFTER_MAIN`.

> Note: <samp>"Medial Ra"</samp> should have been tagged with
> `POS_PREBASE_CONSONANT` in stage 2, step four, and must not be
> re-tagged in this step.

#### Stage 2, step 6: Syllables with below-base-matras ####

Sixth, if the syllable contains any below-base dependent-vowel (matra)
signs, then those below-base matra signs must be tagged with
`POS_BELOWBASE_CONSONANT`.

#### Stage 2, step 7: Mark tagging ####
All of the codepoints that precede the below-base dependent-vowel
signs, but which were not already tagged in steps 1 through 4, must
now be tagged with `POS_AFTER_MAIN`.

<!--- not sure this is done!!! --->
Any `ANUSVARA` marks that appear after the below-base dependent vowel
signs must be tagged wtih `POS_BEFORE_SUBJOINED`.

Seventh, all marks must be tagged with the same positioning tag as the
closest non-mark character the mark has affinity with, so that they move together
during the sorting step.
All remaining codepoints that appear after the below-base
dependent-vowel signs can be tagged with `POS_AFTER_SUBJOINED`.

For all marks preceding the base consonant, the mark must be tagged
with the same positioning tag as the closest preceding non-mark
consonant.

For all marks occurring after the base consonant, the mark must be
tagged with the same positioning tag as the closest subsequent consonant.
#### Stage 2, step 7: Variation selectors ####

> Note: In this step, joiner and non-joiner characters must also be
> tagged according to the same rules given for marks, even though
> these characters are not categorized as marks in Unicode.
Seventh, all Variation Selector codepoints must be tagged with the
same positioning tag as the immediately preceding character.


With these steps completed, the syllable can be sorted into the final sort order.
Expand Down

0 comments on commit aee746e

Please sign in to comment.