Skip to content

Commit

Permalink
Port figure-md and <samp> markup and 'stage-step' headings to all docs.
Browse files Browse the repository at this point in the history
  • Loading branch information
n8willis committed Apr 3, 2023
1 parent 2afafa7 commit 924b59e
Show file tree
Hide file tree
Showing 28 changed files with 4,080 additions and 2,795 deletions.
52 changes: 26 additions & 26 deletions errata.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,30 +48,30 @@ characters. However, the standard does not explicitly say whether or
not the presence of a <abbr>ZWJ</abbr> or <abbr>ZWNJ</abbr> should influence the shaping
behavior of characters for characters not adjacent to the <abbr>ZWJ</abbr> or <abbr>ZWNJ</abbr>.

For example, in the sequence "a,b,ZWNJ,c,d" the <abbr>ZWNJ</abbr> should prevent
the application of a ligature between "b" and "c" (if such a ligature
For example, in the sequence <samp>"a,b,ZWNJ,c,d"</samp> the <abbr>ZWNJ</abbr> should prevent
the application of a ligature between <samp>"b"</samp> and <samp>"c"</samp> (if such a ligature
lookup exists in the active font).

However, if the active font contains a contextual ligature lookup for
"c,d" when preceded by "b", it is not clear whether or not the <abbr>ZWNJ</abbr>
in the same "a,b,ZWNJ,c,d" sequence should inhibit the application of
the ligature between "c" and "d".
<samp>"c,d"</samp> when preceded by <samp>"b"</samp>, it is not clear whether or not the <abbr>ZWNJ</abbr>
in the same <samp>"a,b,ZWNJ,c,d"</samp> sequence should inhibit the application of
the ligature between <samp>"c"</samp> and <samp>"d"</samp>.


#### <abbr>ZWJ</abbr> in redundant ligature lookups ####

An "Implementation Notes" section in chapter 23.2 of the Unicode
Standard says that font vendors should add <abbr>ZWJ</abbr> sequences to ligature
lookups. For example, if the sequence "f,i" triggers the "fi"
lookups. For example, if the sequence <samp>"f,i"</samp> triggers the <samp>"fi"</samp>
ligature, then the font should also include a lookup that triggers the
"fi" ligature for "f,ZWJ,i".
<samp>"fi"</samp> ligature for <samp>"f,ZWJ,i"</samp>.

However, the text of chapter 23.2 prior to the "Implementation Notes"
says that <abbr>ZWJ</abbr> and <abbr>ZWNJ</abbr> "are not to be used in all cases where
ligatures or cursive connections are desired; instead, they are meant
only for over-riding the normal behavior of the text." That logic
makes the suggested "f,ZWJ,i" ligature lookup superfluous, because it
duplicates the effects of the existing "f,i" ligature lookup.
makes the suggested <samp>"f,ZWJ,i"</samp> ligature lookup superfluous, because it
duplicates the effects of the existing <samp>"f,i"</samp> ligature lookup.

Using <abbr>ZWJ</abbr> within lookup patterns in the manner suggested by the
"Implementation Notes" is not common practice.
Expand All @@ -86,8 +86,8 @@ followed by a Fitzpatrick skin-tone modifier but other emoji in the
sequence are not followed by a Fitzpatrick skin-tone modifier.

For example, it is unclear whether the sequence
"Man,ZWJ,Handshake,Man,SkinTone-2" constitues a valid <abbr>ZWJ</abbr> "Couple
holding hands" sequence.
<samp>"Man,ZWJ,Handshake,Man,SkinTone-2"</samp> constitues a valid
<abbr>ZWJ</abbr> "Couple holding hands" sequence.


#### Gender permutations ####
Expand All @@ -98,15 +98,15 @@ are an explicit gender but other emoji in the sequence are not
explicit gender.

For example, it is unclear whether the sequence
"Man,ZWJ,Handshake,Person" constitues a valid <abbr>ZWJ</abbr> "Couple
holding hands" sequence.
<samp>"Man,ZWJ,Handshake,Person"</samp> constitues a valid
<abbr>ZWJ</abbr> "Couple holding hands" sequence.

It is also unclear whether the <abbr>ZWJ</abbr> multi-person family sequence must
have explicit gender-ordering for the adult humans depicted.

For example, it is unclear whether the sequence
"Man,ZWJ,Woman,ZWJ,Girl" should be rendered identically to the
sequence "Woman,ZWJ,Man,ZWJ,Girl".
<samp>"Man,ZWJ,Woman,ZWJ,Girl"</samp> should be rendered identically to the
sequence <samp>"Woman,ZWJ,Man,ZWJ,Girl"</samp>.


## OpenType ##
Expand Down Expand Up @@ -169,12 +169,12 @@ parenthetically that "post-base forms have to follow below-base forms".
If this statement is taken to be a rule, it would affect the
base-consonant search algorithm.

For example, in the Bengali sequence "Ka,Halant,Ba,Halant,Ya"
(`U+0995`,`U+09CD`,`U+09AC`,`U+09CD`,`U+09AF`), "Ka" would be
identified as the syllable base, with "Ba" designated a below-base
form and "Ya" designated a post-base form. However, in the similar
sequence "Ka,Halant,Ya,Halant,Ba"
(`U+0995`,`U+09CD`,`U+09AF`,`U+09CD`,`U+09AC`), "Ya" would be
For example, in the Bengali sequence <samp>"Ka,Halant,Ba,Halant,Ya"</samp>
(`U+0995`,`U+09CD`,`U+09AC`,`U+09CD`,`U+09AF`), <samp>"Ka"</samp> would be
identified as the syllable base, with <samp>"Ba"</samp> designated a below-base
form and <samp>"Ya"</samp> designated a post-base form. However, in the similar
sequence <samp>"Ka,Halant,Ya,Halant,Ba"</samp>
(`U+0995`,`U+09CD`,`U+09AF`,`U+09CD`,`U+09AC`), <samp>"Ya"</samp> would be
identified as the base consonant.

Real-world Bengali texts provide counterexamples that contradict the
Expand Down Expand Up @@ -238,13 +238,13 @@ The Microsoft script-development specifications
that marks should be reordered "to canonical order" (step 3 in the
linked Devanagari document) in the reordering phase. However, the same
step also describes this step as "Adjacent nukta and halant or nukta
and vedic sign are always repositioned if necessary, so that the nukta
and Vedic sign are always repositioned if necessary, so that the nukta
is first."

Together, it is somewhat ambiguous as to whether only "Halant,Nukta"
and "_vedicsign_,Nukta" sequences should be reordered by moving the
"Nukta" to the beginning, or all sequences of marks require reordering
into Unicode canonical combining class order, with "Nukta" moving to
Together, it is somewhat ambiguous as to whether only <samp>"Halant,Nukta"</samp>
and <samp>"_Vedic_sign_,Nukta"</samp> sequences should be reordered by moving the
<samp>"Nukta"</samp> to the beginning, or all sequences of marks require reordering
into Unicode canonical combining class order, with <samp>"Nukta"</samp> moving to
the initial position as a special case.


Expand Down
44 changes: 22 additions & 22 deletions images/images-index.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ The font files used must be publicly and freely available, open-source
fonts. By default, the Noto fonts from Google are the starting point.

A list of the fonts used to generate the latest version of the images
is provided in the [example-fonts.txt](example-fonts.txt) file, with
is provided in the [example-fonts.txt](images/example-fonts.txt) file, with
URLs and SHA checksums for each file.

The image file names follow a simple, but important, pattern:
Expand All @@ -33,30 +33,30 @@ please also follow the file-name pattern. Patches to the image-generation log fo
each script are appreciated, in order to keep the log up-to-date.

- Indic
- [Devanagari](devanagari/devanagari-image-generation-log.md)
- [Bengali](bengali/bengali-image-generation-log.md)
- [Gujarati](gujarati/gujarati-image-generation-log.md)
- [Gurmukhi](gurmukhi/gurmukhi-image-generation-log.md)
- [Kannada](kannada/kannada-image-generation-log.md)
- [Malayalam](malayalam/malayalam-image-generation-log.md)
- [Oriya](oriya/oriya-image-generation-log.md)
- [Tamil](tamil/tamil-image-generation-log.md)
- [Telugu](telugu/telugu-image-generation-log.md)
- [Sinhala](sinhala/sinhala-image-generation-log.md)
- [Devanagari](images/devanagari/devanagari-image-generation-log.md)
- [Bengali](images/bengali/bengali-image-generation-log.md)
- [Gujarati](images/gujarati/gujarati-image-generation-log.md)
- [Gurmukhi](images/gurmukhi/gurmukhi-image-generation-log.md)
- [Kannada](images/kannada/kannada-image-generation-log.md)
- [Malayalam](images/malayalam/malayalam-image-generation-log.md)
- [Oriya](images/oriya/oriya-image-generation-log.md)
- [Tamil](images/tamil/tamil-image-generation-log.md)
- [Telugu](images/telugu/telugu-image-generation-log.md)
- [Sinhala](images/sinhala/sinhala-image-generation-log.md)
- Brahmi-derived
- [Khmer](khmer/khmer-image-generation-log.md)
- [Lao](thai-lao/thai-lao-image-generation-log.md)
- [Myanmar](myanmar/myanmar-image-generation-log.md)
- [Thai](thai-lao/thai-lao-image-generation-log.md)
- [Tibetan](tibetan/tibetan-image-generation-log.md)
- [Khmer](images/khmer/khmer-image-generation-log.md)
- [Lao](images/thai-lao/thai-lao-image-generation-log.md)
- [Myanmar](images/myanmar/myanmar-image-generation-log.md)
- [Thai](images/thai-lao/thai-lao-image-generation-log.md)
- [Tibetan](images/tibetan/tibetan-image-generation-log.md)
- Arabic
- [Arabic](arabic/arabic-image-generation-log.md)
- [Syriac](syriac/syriac-image-generation-log.md)
- [N'Ko](nko/nko-image-generation-log.md)
- [Mongolian](mongolian/mongolian-image-generation-log.md)
- [Arabic](images/arabic/arabic-image-generation-log.md)
- [Syriac](images/syriac/syriac-image-generation-log.md)
- [N'Ko](images/nko/nko-image-generation-log.md)
- [Mongolian](images/mongolian/mongolian-image-generation-log.md)
- Hangul
- [Hangul](hangul/hangul-image-generation-log.md)
- [Hangul](images/hangul/hangul-image-generation-log.md)
- Hebrew
- [Hebrew](hebrew/hebrew-image-generation-log.md)
- [Hebrew](images/hebrew/hebrew-image-generation-log.md)
- Emoji
- [Emoji](emoji/emoji-image-generation-log.md)
50 changes: 25 additions & 25 deletions notes/uniscribe-bug-compatibility.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ regarded as bugs by end users.
- [Indic standalone-syllable dotted circles](#indic-standalone-syllable-dotted-circles)
- [Indic syllable cluster merging](#indic-syllable-cluster-merging)
- [Indic fallback Reph reordering](#indic-fallback-reph-reordering)
- [Kannada legacy treatment of Ra,Halant,ZWJ](#kannada-legacy-treatment-of-ra-halant-zwj)
- [Kannada legacy treatment of "Ra,Halant,ZWJ"](#kannada-legacy-treatment-of-ra-halant-zwj)
- [Khmer kerning](#khmer-kerning)
- [Sinhala matra decomposition](#sinhala-matra-decomposition)
- [Miscellaneous](#miscellaneous)
Expand All @@ -36,8 +36,8 @@ syllable, Uniscribe ignores the glyph when processing the syllable.

For example, the dotted-circle glyph is not counted as a consonant
when locating the syllable's base consonant. Therefore, the sequence
"Ra,Halant,Dotted_Circle" does not trigger Reph formation (which would
result in the sequence "Reph,Dotted_Circle").
<samp>"Ra,Halant,Dotted_Circle"</samp> does not trigger Reph formation (which would
result in the sequence <samp>"Reph,Dotted_Circle"</samp>).


## Indic syllable cluster merging ##
Expand Down Expand Up @@ -72,20 +72,20 @@ syllable, Uniscribe's ultimate fallback behavior is to reorder the
Reph to the end of the syllable.

If the Reph is reordered to the end of the syllable and this final
position happens to occur immediately after a "Matra,Halant" sequence,
position happens to occur immediately after a <samp>"Matra,Halant"</samp> sequence,
Uniscribe leaves the Reph in this position.

Other shaping engines, in this situation, will reorder the Reph to a
position immediately before the "Matra,Halant" sequence. This allows
for any <abbr>GSUB</abbr> substitutions that match "Reph,Matra" sequences to be
position immediately before the <samp>"Matra,Halant"</samp> sequence. This allows
for any <abbr>GSUB</abbr> substitutions that match <samp>"Reph,Matra"</samp> sequences to be
activated, if any such substitution rules are present in the active
font.

## Kannada legacy treatment of Ra,Halant,ZWJ ##
## Kannada legacy treatment of "Ra,Halant,ZWJ" ##

In the `<knda>` shaping model (which was deprecated in 2005 in favor
of `<knd2>`), the sequence "Ra,Halant,ZWJ" was treated as equivalent
to the sequence "Ra,ZWJ,Halant".
of `<knd2>`), the sequence <samp>"Ra,Halant,ZWJ"</samp> was treated as equivalent
to the sequence <samp>"Ra,ZWJ,Halant"</samp>.

## Khmer kerning ##

Expand Down Expand Up @@ -131,7 +131,7 @@ rules for the right-side matra components, relying instead on the
## Miscellaneous ##


### Bengali init feature matching ###
### Bengali `init` feature matching ###

The `init` feature in Bengali is defined in the OpenType specification
as applying to word-initial left-side dependent vowels (matras).
Expand All @@ -156,7 +156,7 @@ range in the Unicode `General Category` property:
### Old-model post-base Halant reordering ###

In old-model (Indic1) script tags, Uniscribe treats some
scripts differently when reordering the first post-base Halant. This
scripts differently when reordering the first post-base <samp>"Halant"</samp>. This
Halant-reordering is done in Indic1 scripts in order to prepare the
syllable for Indic1's different post-base <abbr>GSUB</abbr> substitution rules.

Expand All @@ -170,13 +170,13 @@ would be reordered to

before features are applied.

In Malayalam, Uniscribe always reorders the first post-base Halant in
In Malayalam, Uniscribe always reorders the first post-base <samp>"Halant"</samp> in
a syllable to the position immediately after the syllable's last consonant.

#### Kannada final double Halants ####

In old-model Kannada (`<knda>`) runs, Uniscribe is known to reorder
the first post-base Halant only when there is not already a Halant
the first post-base <samp>"Halant"</samp> only when there is not already a <samp>"Halant"</samp>
after the last consonant.

For example, the old-model Indic syllable
Expand All @@ -185,43 +185,43 @@ For example, the old-model Indic syllable

would _not_ be reordered.

This behavior is an exception to the general Indic1 post-base Halant
This behavior is an exception to the general Indic1 post-base <samp>"Halant"</samp>
reordering operation. It is believed to be script-specific and has
only been observed for Kannada text runs. However, there may still be
undiscovered sequences in other Indic1-script text which trigger the
same behavior; implementers targeting full compatibility should
exercise caution.

If the standard post-base Halant reordering were performed, then the
If the standard post-base <samp>"Halant"</samp> reordering were performed, then the
likely result of the <abbr>GSUB</abbr> feature-application phase would be a
sequence of the form "BaseC,belowbaseC,Halant" which, in turn, might
sequence of the form <samp>"BaseC,belowbaseC,Halant"</samp> which, in turn, might
trigger mark-attachment issues for correctly positioning the final
Halant.
<samp>"Halant"</samp>.

This Uniscribe behavior is not documented, however; therefore the only
recommended workaround for maintaining compatibility is to define a
special-case exception for avoiding the creation of final double
Halants in `<knda>` text.
<samp>"Halant"</samp>s in `<knda>` text.


### Halants and left matras ###

When reordering left-side matras, when a Halant occurs immediately
after a left-side matra, Uniscribe does not move the Halant with the matra.
When reordering left-side matras, when a <samp>"Halant"</samp> occurs immediately
after a left-side matra, Uniscribe does not move the <samp>"Halant"</samp> with the matra.

Generally, marks (including "Halant") are tagged for reordering with
Generally, marks (including <samp>"Halant"</samp>) are tagged for reordering with
the same positioning tag as the closest non-mark character that the
mark has affinity with.

In post-base position, where a yet-to-be-reordered left-side matra
would be found, the closest non-mark character with affinity for the
mark might be a post-base consonant. Uniscribe appears to make a check
ensuring that the Halant after a left-side matra is not tagged for
ensuring that the <samp>"Halant"</samp> after a left-side matra is not tagged for
reordering with the matra.

This check is required for shaping Sinhala, because the `U+0DDA`
multi-part matra decomposes into the sequence "`U+0DD9`,Halant". The
decomposed Halant should remain where it is, serving as the right-side
multi-part matra decomposes into the sequence <samp>"`U+0DD9`,Halant"</samp>. The
decomposed <samp>"Halant"</samp> should remain where it is, serving as the right-side
matra component.


Expand All @@ -239,7 +239,7 @@ can be positioned.
However, Uniscribe is known not to insert a dotted-circle before a
matra character when it is preceded by two sequential
explicit-half-form sequences (meaning two consecutive occurrences of
"_Consonant_,Halant,ZWJ") in Indic2 runs.
<samp>"_Consonant_,Halant,ZWJ"</samp>) in Indic2 runs.

Therefore, the sequence:

Expand Down
Loading

0 comments on commit 924b59e

Please sign in to comment.