Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IR handling of cmap #145

Closed
rsheeter opened this issue Mar 14, 2023 · 6 comments
Closed

IR handling of cmap #145

rsheeter opened this issue Mar 14, 2023 · 6 comments

Comments

@rsheeter
Copy link
Contributor

@anthrotype had an interesting suggestion in #134 (comment), copied for convenience:

actually -- I propose that four our IR we completely do away with the flawed approach of storing a list of unicode codepoints at the Glyph level, and instead have a global CharacterMap (which you can store in the static metadata) that is simply a HashMap<Codepoint, GlyphName>.
Associating codepoints to a glyph is wrong, not just because it obfuscates the fact that a cmap -- which this piece of data is designed to build -- actually maps from codepoints to glyphs instead of the other way around; but most importantly, it opens the door for ambiguous/invalid/impossible to encode mappings whereby more than one glyph ends up being mapped to the same unicode codepoint (e.g. two glyphs "foo" and "bar" both having unicodes list = [0x0061]; if one types "a", which one will one get, foo or bar?!)

See discussion on the proposal for storing a global cmap.txt and removing GLIF-level unicode elements in UFO4: unified-font-object/ufo-spec#77

@rsheeter
Copy link
Contributor Author

I believe the counterarguments would be:

  1. This moves the IR too far from the source representation, which does store this information on glyphs
  2. It doesn't really save any complexity.
    • Because the information lives on glyphs we'd end up adding an IR step that depended on all IR glyphs completion so it could emit the combined map.
    • The only place we need the cmap is the BE cmap task, so it can just do that job

Looking at my own bullets it strikes me that if there are other places than the BE cmap task that need to see the cmap this has appeal.

@anthrotype
Copy link
Member

if there are other places than the BE cmap task that need to see the cmap

I don't think there are, cmap is for... cmap :)
It's true that you'll have to read all glyphs to be able to get to the codepoints, because that's where they currently live in the sources..
Let's think about it, maybe revise if the source formats do get changed.

@khaledhosny
Copy link
Contributor

Handling variation selectors should be kept in mind as well, I don't know how UFO handles it, but the Glyphs way is awkward.

@anthrotype
Copy link
Member

anthrotype commented Mar 15, 2023

I don't know how UFO handles it

UFO has "public.unicodeVariationSequences", implemented in ufo2ft here.

https://unifiedfontobject.org/versions/ufo3/lib.plist/#publicunicodevariationsequences

@anthrotype
Copy link
Member

the Glyphs way is awkward.

OMG.. "You should just add a “.uv002” (the number is the index of the selector starting at FE00 = 1 …) suffix to the alternate glyph."

@rsheeter
Copy link
Contributor Author

Filed #157 to track variation selectors. I think that means there is nothing immediately actionable for cmap so closing. As ever please reopen if I'm wrong.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants