-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Needing guidance for ambiguity about init
feature scope in Bengali
#104
Comments
…ee issue #104 and discussion at #101 (comment) for background re the intended scope of init.
I don’t think HarfBuzz or Uniscribe ever applied these feature to other scripts, so the presence of such fonts is largely irrelevant. |
Or Core Text, or probably anything other than InDesign. |
I also believe the current patched wording correctly reflects both shapers’ ability and fonts’ usage/expectation of this OTL feature in reality. Shaping stuff other than Bangla vowel signs with automatically applied |
So, if I'm following both of you correctly: (a) there's no value to worrying about any scripts other than (b) There's an outside chance somebody has had decorative or other funky fonts for (c) People probably didn't apply |
(Still trying to grasp your attitude towards this issue. I find myself often have difficulty to switch from a font developer’s point of view to a shaper developer’s…)
Arabic and other Arabic-cursive-joining scripts, scripts handled by the USE, as well as
Yes, or just relaxing the script restriction in the spec. From a shaper’s point of view, the footnote, if any, should probably be based on whether there’s any known shaper out there actually applying the
@tiroj should be able to provide more confirmation on this. |
Backing up a bit: the revised init (medi, fina, and isol) feature descriptions came about because the old ones didn't accurately describe what shaping engines were actually doing. The original feature descriptions talked about applying the features based on word-position, but word-position analysis is not how Arabic, Syriac, etc. shaping is performed, because scripts with normative cursive joining behaviour include both joining and non-joining characters mid-word, so what matters is the joining properties of adjacent letters, not their position in the word. So the feature descriptions were rewritten to specify that they relate directly to implementation of Unicode ArabicJoining.txt properties, and not to word analysis. [I noted at the time that this left open the possibility of defining new features specifically to apply word-positional or e.g. line-positional forms, independent of joining behaviour.] The Bengali case is the one-off exception to the general rule that init applies only to ArabicShaping.txt joining properties. This is because all beng and bng2 shaping engines apply the feature based on analysis of U+0DC7 and U+0DC8 occurring at the beginning of a word. This is the only standardised and specified case in which word-positional analysis is performed by shaping engines. When Indic shaping was being worked out at Microsoft, it was noted that writing and typography had this feature in which word-initial forms of these vowel signs did not have a spur on the left side, so rather than either requiring this to be handled with contextual substitutions (which wouldn't work within the broken context range of Indic shapers) or defining a specific Bengali feature, the init feature was specified for this purpose. So the exceptional use for beng and bng2 is retained; note that bng3 for USE processing would not be able to use init for this purpose, and would need to implement the substitution contextually in the GSUB lookup. Yes, I think there probably are some Latin cursive style fonts that tried to use init, medi, fina, and isol for letter shaping. To my knowledge, they wouldn't have worked very widely, and their makers probably would have instead chosen to use contextual substitutions if they cared about the fonts working across a range of platforms and applications. |
PS. If Microsoft had decided to require the Bengali initial vowel sign substitution to be handled using contextual substitution, rather than via init, they would presumably have very quickly realised that their context range was broken for Indic scripts, and might have fixed it. So it is a great pity that they didn't. |
I should note that HarfBuzz does the following, which I read as "if a left matra is not word-initial and the preceding character falls outside a range of General Category classes, apply the The two relevant commits (here and here) lead me to believe that this was done to imitate Uniscribe's behaviour. Allsorts now follows suit, so perhaps this should be formalised. (Apologies - this is not related to |
Many thanks for the clarifications. If it's not beating-a-dead-tangent (and solely to put a tiny piece of my mind at ease), was the c.2017 allusion to
Check. |
Okay, so it's basically saying consider it a "word start" if there's a non-letter-and-non-mark codepoint before it (plus related whatnot).... That definitely makes sense; numerals and punctuation and so on. Would that be a situation that ought to already get handled before it gets to the shaper, though? As in, it's part of segmenting the text run. Doesn't mean the shaping engine shouldn't be aware of it, of course. Just a question about what the standard MO is. |
It’s sort of typical of the OTL feature specifications that the init feature would state—at least before the rewrite, and still for beng and bng2—that the feature should be applied to a word-initial glyph without actually specifying how a word-initial glyph is to be determined. The HarfBuzz behaviour sounds sensible, and probably is what Uniscribe/DWrite does too, but so far as I know this is among the implementation details that are nowhere specified. |
One last question: @tiroj, you say I could imagine that the difference in shapes between the matras would make a distinction in everyday practice, but I'm a little wary of being so prescriptive. |
I’ve not seen the ikar get a word-initial form. The ductus of the letter is different from the ekar shape, so doesn't lend itself in the same way to a spurless head line connection. That said, I am unsure whether shaping engines would make the distinction, or if they would simply process the init lookup for the first glyph in a word, regardless of what that word is. Something one does see in some Bengali fonts, notably in display and headline types, is word-final forms of iikar where the head line does not extend to the right of the letter. This needs to be handled using contextual substitutions, but support is hampered by the cluster boundary model still applied to GSUB in Microsoft and Adobe engines even for the rclt feature. |
…ee issue #104 and discussion at #101 (comment) for background re the intended scope of init.
At present, the Microsoft script-guidance page for Bengali (
beng
andbng2
script tags) states that theinit
GSUB feature should only apply to left-side matra glyphs when they appear in the word-initial position. (And should not apply to other letters even when they appear in word-initial position.)The fact that
init
applies at all is somewhat of an outlier, since otherwise the feature is primarily designed to work with Arabic and other cursive, joining scripts.But the wording was different up through at least December 2017 (visible in this Wayback Machine link), saying instead "This feature takes nominal (full) forms of consonants and produces initial forms when the glyph is at the beginning of a word" even though the example image is a left-side matra.
The new wording comes from a change proposed by John Hudson in 2016, which followed a TypeDrawers discussion thread in which people indicated varying levels of expectation about whether or not
init
(andfina
) should be implemented for other scripts.It certainly seems like some fonts may exist that exploit those features for (e.g.) cursive styles of Latin. For shaping-engine authors, however, the more particular question is whether it's right or wrong to apply the feature to letters other than left-side matras.
Sticking strictly to the spec, it would be a "today no" but there may be old fonts in the wild that expect it, so perhaps a note of guidance ought to go in somewhere.
The text was updated successfully, but these errors were encountered: