IR handling of cmap #145

rsheeter · 2023-03-14T19:04:27Z

@anthrotype had an interesting suggestion in #134 (comment), copied for convenience:

actually -- I propose that four our IR we completely do away with the flawed approach of storing a list of unicode codepoints at the Glyph level, and instead have a global CharacterMap (which you can store in the static metadata) that is simply a HashMap<Codepoint, GlyphName>.
Associating codepoints to a glyph is wrong, not just because it obfuscates the fact that a cmap -- which this piece of data is designed to build -- actually maps from codepoints to glyphs instead of the other way around; but most importantly, it opens the door for ambiguous/invalid/impossible to encode mappings whereby more than one glyph ends up being mapped to the same unicode codepoint (e.g. two glyphs "foo" and "bar" both having unicodes list = [0x0061]; if one types "a", which one will one get, foo or bar?!)

See discussion on the proposal for storing a global cmap.txt and removing GLIF-level unicode elements in UFO4: unified-font-object/ufo-spec#77

rsheeter · 2023-03-14T19:08:48Z

I believe the counterarguments would be:

This moves the IR too far from the source representation, which does store this information on glyphs
It doesn't really save any complexity.
- Because the information lives on glyphs we'd end up adding an IR step that depended on all IR glyphs completion so it could emit the combined map.
- The only place we need the cmap is the BE cmap task, so it can just do that job

Looking at my own bullets it strikes me that if there are other places than the BE cmap task that need to see the cmap this has appeal.

anthrotype · 2023-03-14T19:17:28Z

if there are other places than the BE cmap task that need to see the cmap

I don't think there are, cmap is for... cmap :)
It's true that you'll have to read all glyphs to be able to get to the codepoints, because that's where they currently live in the sources..
Let's think about it, maybe revise if the source formats do get changed.

khaledhosny · 2023-03-14T19:28:48Z

Handling variation selectors should be kept in mind as well, I don't know how UFO handles it, but the Glyphs way is awkward.

anthrotype · 2023-03-15T11:14:22Z

I don't know how UFO handles it

UFO has "public.unicodeVariationSequences", implemented in ufo2ft here.

https://unifiedfontobject.org/versions/ufo3/lib.plist/#publicunicodevariationsequences

anthrotype · 2023-03-15T11:20:05Z

the Glyphs way is awkward.

OMG.. "You should just add a “.uv002” (the number is the index of the selector starting at FE00 = 1 …) suffix to the alternate glyph."

rsheeter · 2023-03-16T22:21:02Z

Filed #157 to track variation selectors. I think that means there is nothing immediately actionable for cmap so closing. As ever please reopen if I'm wrong.

rsheeter mentioned this issue Mar 14, 2023

Build a simple cmap #134

Merged

rsheeter mentioned this issue Mar 16, 2023

Support variation selectors #157

Open

rsheeter closed this as completed Mar 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IR handling of cmap #145

IR handling of cmap #145

rsheeter commented Mar 14, 2023

rsheeter commented Mar 14, 2023

anthrotype commented Mar 14, 2023

khaledhosny commented Mar 14, 2023

anthrotype commented Mar 15, 2023 •

edited

Loading

anthrotype commented Mar 15, 2023

rsheeter commented Mar 16, 2023

IR handling of cmap #145

IR handling of cmap #145

Comments

rsheeter commented Mar 14, 2023

rsheeter commented Mar 14, 2023

anthrotype commented Mar 14, 2023

khaledhosny commented Mar 14, 2023

anthrotype commented Mar 15, 2023 • edited Loading

anthrotype commented Mar 15, 2023

rsheeter commented Mar 16, 2023

anthrotype commented Mar 15, 2023 •

edited

Loading