-
Notifications
You must be signed in to change notification settings - Fork 489
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provide support for all cmap table formats #105
Comments
I'd somewhat advocate not bothering with this - the format is so old nothing makes these fonts anymore (the format 0 cmap is horrendendously inadequate for anything but toy fonts =). Adding support for more complex or new formats like 13/14 would be worth doing, but format 0 would add support for something we shouldn't even be using anymore. |
Looks like Apple just decided to use platformID = 0 for their default system font, see #139 |
cmap 12 read support was just added with PR #207 😉 |
Any other important formats we should support? |
@fdb 4 is limited to 16 bit (Unicode Plan 1) & 12 to 32 bit (All Unicode Plans) they follow the same specification & it looks like they're the most common cmap tables. I decompiled some fonts with FontTools & found that format 6 is also common. |
For proper opentype support, I'd consider cmap 4, 12, 13 and 14 essential: cmap 4 and 12 for "proper plain old unicode" support—4 mapping to UCS2, and 12 mapping to UCS4—and the (recently introduced) cmap 13 and 14 because opentype needs them for properly supporting many-to-one mapping, and variation selection mapping, respectively. Although that said, many of the other formats are almost trivial to implement compared to subtables 4 and 12, so... I'd honestly just say "implement them all". If effort is already going into proper cmap handling, handling all of them is good target. |
@Pomax Nice to know! But before that we will need to change how the cmap tables are handled, because right now if the cmap 12 is found the cmap 4 is not read (this is not a problem as 12 is a superset including 4) but we can't do that if we're adding more formats. By the way are the 13 & 14 well implemented now? |
They're getting to. I'm not sure why you'd skip 4 if 12 is found, though, but then I've not read the code in quite a while; keeping the UCS-2 and UCS-4 sets separate is generally a good idea, sometimes even with a cmap 0 for the 256 ANSI block, so the cmap parsing procedure is that you check which cmap subtables are available, then run through each of those to find your character index. The "does this character have an index according to this subtable" is a generally fast procedure, so you might "waste some time" looking in tables, but it will be negligible compared to the time necessary to render the glyph outline. Also note that cmap 13 uses the exact same data structures and information coding as 12, except that the "start glyph" for a character range as used in 12 is simply considered "the only glyph" in 13, so if you have an implementation for 12 already, adding support for 13 (barring needing a rewrite on how characters are mapped through multiple cmap subtables of course) is virtually no extra work. |
@Pomax The cmap 12 support was recently added by @Vildan & I think it was just easier to skip 4 if 12 was found. If not, it will need a rewrite. For now, it's easier & performance-wise faster, but not future-proof! Thanks for the details though! |
skipping 4 when 12 is found is a great way to not find characters that are definitely in the font, so filing an issue to make sure all sub tables are checked will be a good idea =) as for contributing: I run an insane amount of projects already, so writing comments or just talking about how the opentype spec wants things done is a quick and easy job I am happy to do; reviewing code for whether an approach is sound is a bit more work, but typically still doable with a few 15 minutes here or there, but writing code is way more work than I have free time for at the moment =) |
Hey @Pomax thanks for clearing that up. It sounds it'll be a good idea to keep all of them and do a lookup through them. Do you know if the spec says something about the order in which they should be looked up? |
Because there are only format 4 and 12 now, and 12 is superset of 4, there is no need to read format 4 if a font has format 12 in it. And because cmap tables placed in ascending order, we can find format 12 before format 4. @Pomax, do you have an example when we skip characters if read only format 12? I ran this test on 4000+ fonts and didn't find a single font where format 4 gives some extra characters versus format 12 |
Rereading the spec, you're right; it quite literally says |
Here’s some test cases for cmap subtables; see README for how to run the test suite. |
We create a font subset online DEMO that compares some of the differences between opentype.js and fonttools subsets, may be helpful. |
Technically by supporting format 12, you get format 13 for free right? |
I have a TON of PDFs that use 14. Just throwing my vote in for this- I have no idea what it's all about :-) |
Format 13 will be supported via #647, which will close this issue. As discussed before, it's not worth the time to support obscure formats that will probably never be encountered in the wild. Anyone providing a real font with an unsupported format is still welcome to open a new issue for that, of course! |
E.g. platformID = 1, encodingID = 0 as used in http://www.ivank.net/BRUSHSTP.ttf.
The text was updated successfully, but these errors were encountered: