Replies: 22 comments 61 replies
-
There was recently a thread on the Glyphs forum that I found clarifying. The Variation Sequence is a mechanism for specifying variant shapes of Unicode-encoded characters, similar in function to OpenType Stylistic Sets and Character Variants (though these offer a good bit more flexibility). It is supported by Glyphs, the font editor I use. The Microsoft document describes the technical details of how Variation Sequences must be implemented in a font. There are two methods, Default and Non-Default, which appear to me to be functionally the same. I suppose Default simply means that that is the method that font editors like Glyphs are expected to use unless there's some reason to do it differently (but I can't guess what that reason would be). More interesting, from my point of view, are the FAQ and the associated lists of variants. From these I take two points relevant to Junicode:
So there can be no variations of this kind in Junicode unless they are defined by Unicode at some point; and they seem parsimonious about handing them out for the Latin script. I have no idea whether applications support Variation Sequences, but support has to be supplied at that level too for them to work. |
Beta Was this translation helpful? Give feedback.
-
If I understand correctly, there is no limitations concerning the use of variant sequences for private characters. What about making some MUFI characters accessible also with variant sequences? It would allow to test how the software handle this (in particular Emacs and XeLaTeX). |
Beta Was this translation helpful? Give feedback.
-
These days I like to stick very close to the specification—the reason being that, even if all applications handle a non-standard feature correctly now (and I haven't tested Variant Sequences to determine if this is the case), there is no guarantee that some major app won't come along that refuses to recognize them. Right now, some Adobe apps disable features they judge are not needed for a particular language: will they support VSs for languages that use the Latin script? I don't know: maybe I'll test. As I understand the matter, the VS is similar in function to Character Variants (which I use liberally). The one thing the VS gets you that the CV doesn't is the ability to indicate "in plain text" (rather than in the markup) that a particular variant is needed. I wonder why it would be worthwhile implementing VSs when CVs seem to be functioning well? |
Beta Was this translation helpful? Give feedback.
-
I think the prospect of getting VS characters added to Unicode is faint and distant, given that there are currently precisely zero Latin-script characters there. I don't know exactly what the thinking is among the Unicoders, but from the outside it looks as if they're thinking that it is simply not a feature for Latin script. I personally think that's a mistake, probably based on a misconception about Latin script (that it is always and everywhere a "simple" script), but I doubt that anyone is going to budge them without first opening a wider discussion about the character of Latin script and the purposes of VS. I understand the concern about font-specific features. For what it's worth, very much in the front of my mind as I've been working on Junicode 2 is to come up with an OpenType feature scheme that is rational enough to be standardized, and (once the Junicode feature set is in a stable place) to come up with a feature file that can be easily applied to any MUFI font. I've had a few preliminary words with Tarrin on this subject, but I haven't had time (so far) to write up my thoughts about this in detail. Janusz--your reply came in as I was writing this. I'll try to find Karl Pentzlin's proposal and have a look. |
Beta Was this translation helpful? Give feedback.
-
Just going to chime in and say that Unicode tag characters (any number of Like variation selectors, tag characters are “default ignorable”, and they are unlikely to be used for anything outside of emoji anytime soon. So it seems to me like it might be preferable to an unsanctioned use of a variation selector, which would introduce conflicts if Unicode ever changes their mind and starts designating official variation sequences for Latin characters. |
Beta Was this translation helpful? Give feedback.
-
I've just been looking at the Noto Color Emoji and BabelStone Flags fonts, which use the tag characters. These are presently used in only one way, for variants on U+1F3F4 WAVING BLACK FLAG. The tags correspond to the ASCII character set, so you can use them to spell out region tags. So (using friendly naming instead of Unicodes), the sequence Whereas the Variation Selectors depend on a specialized lookup type in the font, Noto Color Emoji and BabelStyle Flags implement tags as ligature lookups. They put them in ccmp (so they're always on), but it seems to me they might (in theory anyway) work in any feature, say a Stylistic Set, so they could be switched off if they caused a performance hit. Further, since the tags are used to spell out "words" of arbitrary length, the mechanism would seem to be almost infinitely flexible. The tag characters are ignorable, so they shouldn't trip up search engines—but who knows? Things to think about.
So—a mixed reading from my point of view. Plusses and minuses to such an approach. |
Beta Was this translation helpful? Give feedback.
-
An advantage of tags over VS is that you can use them mnemotechnically, see below. BTW, even Emacs 28 has problem with generating the correct Postscript for emojis, I had to make just a screenshot. |
Beta Was this translation helpful? Give feedback.
-
I looked at the StackExchange thread. Very complicated, but mostly that was just Lisp scripting in Emacs, which is always a pain. I tested in FontGoggles, where it was pretty straightforward. I'd be interested in trying out some characters exprerimentally. Any input as to which characters to experiment with and what the sequences of tags might look like? |
Beta Was this translation helpful? Give feedback.
-
I'd like to keep tag sequences as short as possible for reasons of performance and file size. It occurs to me now that one place to look for inspiration might be the MUFI entity references, which are built out of a standardized collection of abbreviations, e.g. "ins" for insular, "lig" for ligature, placed in a particular order. The tag doesn't need to specify the base character, because that's there in the sequence. So if we think of U+F000F as U+A749 with flourish (not sure this is right, but just illustrating), the tag sequence would be
Which is a long sequence for a ligature-type lookup, but I don't think there would be much of a performance hit. |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
@jsbien: I'll probably be looking at XeTeX and LuaTeX today. I expect it to go well, since XeTeX uses Harfbuzz for a layout engine, like LibreOffice and Firefox. @TheKnightWho: I wondered that too. It's probably there because the tag sequence is indeterminate in length (the two-character Regional Indicator sequence, which functions similarly, doesn't need a terminating character), but it's not strictly necessary in OpenType because the compiler reorders ligature rules so that longer ones come first. Still, it's there in the Unicode docs (and you know I like adhering to the spec), and it may provide some clarity for users, like the semicolon that ends a character entity reference. |
Beta Was this translation helpful? Give feedback.
-
BTW, the performance hit from these complex lookups is pretty obvious in LibreOffice. A page with lots of them could be painful. I'm thinking of making all those MUFI entity elements (which can be up to seven letters long) into two-letter sequences. That should be mnemonic enough, and it will be rare to need a sequence of more than two tags (three with the cancel at the end). |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
Here's a list of proposed tag sequences. They correspond to the (usually longer) MUFI abbreviations used to build character entity references. Most of the time only one two-letter sequence will be needed, but they can be concatenated when necessary. These are not for users building composite characters (I don't think that will be possible), but for me to use making tag sequences. I hope they're mnemonic enough. A number of these will I think not be needed (and I've left off a couple that I'm sure won't be needed). Comments? |
Beta Was this translation helpful? Give feedback.
-
One concern that I've had is the application of multiple tags to the same character, and how any implementation would handle it. Please do let me know if I have the wrong idea (because I hope that I do), but the only solution seems to be to encode compound ligatures. However: a) Even if you limit combinations to a maximum of two, you're looking at 1,431 compound tags. Let's say 200 realistic possibilities. There are certainly situations where you might want three, though (e.g. small caps uncial m with macron). b) Even that assumes that m + __s + __c + __u + __n would be automatically processed identically to m + __u + __n + __s + __c. I don't think it will be, because that would cause issues for flag country codes (and, incidentally, any combinations of tag letters that are anagrams). That doubles the number. If we wanted combinations like m + __s + __c + __u + __n + __m + __a (i.e. with the macron), it becomes completely out of control (nearly 25,000 theoretical combinations, each with 6 ways of being entered). I would be surprised if the same issues applied to VS, because they're designed with this sort of usage in mind, whereas tags are more of a happy accident for certain use cases that seem to lack scalability. |
Beta Was this translation helpful? Give feedback.
-
I explained myself badly. When I had made up my list of tag-pairs, it struck me that it looked like a set of building blocks, but it's not: it's just meant to describe what the font contains. For example, MUFI has an x with slash across the right lower leg. That is produced by the sequence Similarly, although there are tag-pairs describing diacritics in the list, that doesn't mean you can use them to build character + diacritic combinations. Unicode provides a much better way to accomplish that--plus, as @TheKnightWho points out, that kind of thing would lead to an impossible number of combinations to cover with tags. You can just use the sequences that are defined in the font. There's a tentative list of tag sequences (covering only the same ground as cv01-cv52, but that's just a start) in the document tag_key.pdf, and I will shortly have a font for people to test with. |
Beta Was this translation helpful? Give feedback.
-
I ran into difficulties with the scheme I devised before. Without getting into details about it, the problem turned out to be having tag sequences of of varying lengths. I had to either (1) make them all the same length or (2) terminate each one with cancel.tag. I chose (1) and decided that each sequence would contain a base character and two tags. That meant that the scheme of tags would be less expressive than before--a bit less mnemonic. it is laid out here. (The good thing about revising the scheme was that I could eliminate tag sequences that weren't needed.) A list of characters covered by the new tag scheme is here. It contains two or three errors that will come right when the font is rebuilt, probably some time in the next few days. Until then, the fonts currently posted use the old scheme and aren't usable. |
Beta Was this translation helpful? Give feedback.
-
Just curious: how do you create the tag symbols in tag_key.pdf? The document properties say it was created with LibreOffice, but tag_key.odt represents the tag characters differently. |
Beta Was this translation helpful? Give feedback.
-
Your XeTeX/LuaTex minimal example doesn't work for me. I slightly changed it to account in particular for the changed tags. |
Beta Was this translation helpful? Give feedback.
-
As for F0011, F0012, F0013, F0014 and F0021, I would like to remind that we have now in Unicode 'LATIN SMALL LETTER OLD POLISH O' (U+A7C1). So you can use it as the base character of tag sequences instead of or additionally to just "o". |
Beta Was this translation helpful? Give feedback.
-
The \lhighstrokeflourish command should start with plain l (U+006C), then \char"E0073\char"E0066. |
Beta Was this translation helpful? Give feedback.
-
I have posted a new version of tag_key.pdf with instructions and descriptions. |
Beta Was this translation helpful? Give feedback.
-
Please see http://unicode.org/faq/vs.html for basing information about variant sequences.
I'm curious how it looks from a point view of a font designer. Just found https://docs.microsoft.com/en-us/typography/opentype/spec/cmap#format-14-unicode-variation-sequences. However the practical consequences of it are not clear for me. In particular the difference between Default and Non-Default UVS table is not clear for me.
Beta Was this translation helpful? Give feedback.
All reactions