-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
halant + length in Kannada script #163
Comments
The Kannada script development spec allows one virama after the vowel signs: Separate syllables with ZWNJ to avoid a conjunct. Unicode recommends this convention for many Indic scripts, including Malayalam. For example, here is <U+0C95, U+0CCD, U+200C, U+0C95> ⟨ಕ್ಕ⟩: |
Thanks for pointing out the, as you say, not the most obvious code point order. I had not realised that. I tested some other fonts, Noto Serif Kannada and Nirmala UI had a clash, Tunga and Tiro Kannada displayed like your example of Noto Sans Kannada. So if this sequence gets used, I might file bug reports on the needed fonts. I realised I can get the desired visual form using ZWNJ. However, as discussed in section 4 on page 4 of the Tulu document, ZWNJ is ignored for some text processes. However, Unicode, in section 23.2 Layout Controls - Cursive Connection and Ligatures has a paragraph called Filtering Joiner and Non-joiner (which I am just reading now). While that paragraph starts with the same conclusion as the Tulu document, it goes on to say that (in particular) for Indic scripts ZWJ and ZWNJ should generally be considered for many text processes. So is Unicode 23.2 enough that I should use ZWJ and ZWNJ to denote orthographic differences that reflect different phonetics and therefore different words? Or does a variation sequence or other mechanism (such as new codepoint) be proposed to handle this situation? |
Besides section 23.2, section 12.1 “Devanagari” subsection “Alternative Forms of Cluster-Initial RA” gives an example where ZWJ is orthographically significant in an Indic script, and UAX #31 section 2.3 “Layout and Format Control Characters” gives examples of significant ZWJ and ZWNJ, including a Malayalam example which is similar to the case of this Tulu vowel. It would therefore be consistent and reasonable for Unicode to recommend ZWNJ for Tulu in Kannada script, instead of a new code point or other mechanism. Unicode isn’t always consistent between scripts, though, so you can’t know for sure till the standard specifically says so. |
Some smaller languages written in Kannada script use a visible U+0CCD KANNADA SIGN VIRAMA as a vowel sign instead of a vowel killer (which normally is either visible or becomes invisible and causes consonant stacking).
One problem occurs when U+0CD5 KANNADA LENGTH MARK is used to distinguish vowel lengths. The difficulty comes when a language has different vowels than the Kannada language (written in Kannada script). The issue is U+0CCD followed by U+0CD5 produces a dotted circle in OpenType shaping engines. Which makes sense for Kannada language, why would you use a vowel killer after a vowel sign? But for other languages, the dotted circle in not desired.
A second problem occurs when the virama should represent a vowel, but if the virama occurs between two consonants, the virama becomes invisible and a constant stack results. Both of these problems are discussed in a document on the Tulu language written in Kannada script.
A similar issue occurs with another Dravidian language, Malayalam. When writing the Malayalam language in the Malayalam script, a virama can be used as a vowel killer, or as a vowel (samvrittokarama). The discussion of the half-u and the chandrakkala (virama) gives more details.
While the above issues might be a bug in the OpenType shaping engines, I thought it would be best to have a consensus of how to handle these issues before filing multiple bug reports. Please accept my apologies if I could have posted this in a better location and/or copied different people.
@behdad @dscorbett @jfkthame @xadxura @PeterCon @LornaSIL
The text was updated successfully, but these errors were encountered: