Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

halant + length in Kannada script #163

Open
devosb opened this issue Nov 28, 2023 · 3 comments
Open

halant + length in Kannada script #163

devosb opened this issue Nov 28, 2023 · 3 comments

Comments

@devosb
Copy link

devosb commented Nov 28, 2023

Some smaller languages written in Kannada script use a visible U+0CCD KANNADA SIGN VIRAMA as a vowel sign instead of a vowel killer (which normally is either visible or becomes invisible and causes consonant stacking).

One problem occurs when U+0CD5 KANNADA LENGTH MARK is used to distinguish vowel lengths. The difficulty comes when a language has different vowels than the Kannada language (written in Kannada script). The issue is U+0CCD followed by U+0CD5 produces a dotted circle in OpenType shaping engines. Which makes sense for Kannada language, why would you use a vowel killer after a vowel sign? But for other languages, the dotted circle in not desired.

A second problem occurs when the virama should represent a vowel, but if the virama occurs between two consonants, the virama becomes invisible and a constant stack results. Both of these problems are discussed in a document on the Tulu language written in Kannada script.

A similar issue occurs with another Dravidian language, Malayalam. When writing the Malayalam language in the Malayalam script, a virama can be used as a vowel killer, or as a vowel (samvrittokarama). The discussion of the half-u and the chandrakkala (virama) gives more details.

While the above issues might be a bug in the OpenType shaping engines, I thought it would be best to have a consensus of how to handle these issues before filing multiple bug reports. Please accept my apologies if I could have posted this in a better location and/or copied different people.

@behdad @dscorbett @jfkthame @xadxura @PeterCon @LornaSIL

@dscorbett
Copy link

The Kannada script development spec allows one virama after the vowel signs: {M}+[N]+[H]. For example, here is <U+0C95, U+0CD5, U+0CCD> ⟨ಕೕ್⟩ in Noto Sans Kannada in HarfBuzz:
ಕೕ್
It’s not the most obvious code point order for Kannada, but it is consistent with how other Indic scripts work in OpenType.

Separate syllables with ZWNJ to avoid a conjunct. Unicode recommends this convention for many Indic scripts, including Malayalam. For example, here is <U+0C95, U+0CCD, U+200C, U+0C95> ⟨ಕ್‌ಕ⟩:
ಕ್‌ಕ

@devosb
Copy link
Author

devosb commented Dec 5, 2023

Thanks for pointing out the, as you say, not the most obvious code point order. I had not realised that. I tested some other fonts, Noto Serif Kannada and Nirmala UI had a clash, Tunga and Tiro Kannada displayed like your example of Noto Sans Kannada. So if this sequence gets used, I might file bug reports on the needed fonts.

I realised I can get the desired visual form using ZWNJ. However, as discussed in section 4 on page 4 of the Tulu document, ZWNJ is ignored for some text processes. However, Unicode, in section 23.2 Layout Controls - Cursive Connection and Ligatures has a paragraph called Filtering Joiner and Non-joiner (which I am just reading now). While that paragraph starts with the same conclusion as the Tulu document, it goes on to say that (in particular) for Indic scripts ZWJ and ZWNJ should generally be considered for many text processes.

So is Unicode 23.2 enough that I should use ZWJ and ZWNJ to denote orthographic differences that reflect different phonetics and therefore different words? Or does a variation sequence or other mechanism (such as new codepoint) be proposed to handle this situation?

@dscorbett
Copy link

Besides section 23.2, section 12.1 “Devanagari” subsection “Alternative Forms of Cluster-Initial RA” gives an example where ZWJ is orthographically significant in an Indic script, and UAX #31 section 2.3 “Layout and Format Control Characters” gives examples of significant ZWJ and ZWNJ, including a Malayalam example which is similar to the case of this Tulu vowel. It would therefore be consistent and reasonable for Unicode to recommend ZWNJ for Tulu in Kannada script, instead of a new code point or other mechanism. Unicode isn’t always consistent between scripts, though, so you can’t know for sure till the standard specifically says so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants