Provide support for all cmap table formats #105

fdb · 2015-04-03T06:05:16Z

E.g. platformID = 1, encodingID = 0 as used in http://www.ivank.net/BRUSHSTP.ttf.

Pomax · 2015-06-25T04:55:42Z

I'd somewhat advocate not bothering with this - the format is so old nothing makes these fonts anymore (the format 0 cmap is horrendendously inadequate for anything but toy fonts =). Adding support for more complex or new formats like 13/14 would be worth doing, but format 0 would add support for something we shouldn't even be using anymore.

bitinn · 2015-06-25T09:26:15Z

Looks like Apple just decided to use platformID = 0 for their default system font, see #139

Jolg42 · 2016-07-29T15:56:00Z

cmap 12 read support was just added with PR #207 😉

fdb · 2016-07-31T09:19:08Z

Any other important formats we should support?

Jolg42 · 2016-07-31T09:49:10Z

@fdb 4 is limited to 16 bit (Unicode Plan 1) & 12 to 32 bit (All Unicode Plans) they follow the same specification & it looks like they're the most common cmap tables.

I decompiled some fonts with FontTools & found that format 6 is also common.
So maybe the next step will be reading format 6 but if nobody is having a problem now, maybe we can wait before implementing it 😉

Pomax · 2016-07-31T22:42:23Z

For proper opentype support, I'd consider cmap 4, 12, 13 and 14 essential: cmap 4 and 12 for "proper plain old unicode" support—4 mapping to UCS2, and 12 mapping to UCS4—and the (recently introduced) cmap 13 and 14 because opentype needs them for properly supporting many-to-one mapping, and variation selection mapping, respectively.

Although that said, many of the other formats are almost trivial to implement compared to subtables 4 and 12, so... I'd honestly just say "implement them all". If effort is already going into proper cmap handling, handling all of them is good target.

Jolg42 · 2016-08-01T10:25:20Z

@Pomax Nice to know!
I think that CMAP 12 writing is the most important right now but one day maybe we will support every format ;)

But before that we will need to change how the cmap tables are handled, because right now if the cmap 12 is found the cmap 4 is not read (this is not a problem as 12 is a superset including 4) but we can't do that if we're adding more formats.

By the way are the 13 & 14 well implemented now?

Pomax · 2016-08-01T17:00:00Z

They're getting to.

I'm not sure why you'd skip 4 if 12 is found, though, but then I've not read the code in quite a while; keeping the UCS-2 and UCS-4 sets separate is generally a good idea, sometimes even with a cmap 0 for the 256 ANSI block, so the cmap parsing procedure is that you check which cmap subtables are available, then run through each of those to find your character index. The "does this character have an index according to this subtable" is a generally fast procedure, so you might "waste some time" looking in tables, but it will be negligible compared to the time necessary to render the glyph outline.

Also note that cmap 13 uses the exact same data structures and information coding as 12, except that the "start glyph" for a character range as used in 12 is simply considered "the only glyph" in 13, so if you have an implementation for 12 already, adding support for 13 (barring needing a rewrite on how characters are mapped through multiple cmap subtables of course) is virtually no extra work.

Jolg42 · 2016-08-01T17:33:14Z

@Pomax The cmap 12 support was recently added by @Vildan & I think it was just easier to skip 4 if 12 was found. If not, it will need a rewrite. For now, it's easier & performance-wise faster, but not future-proof!

Thanks for the details though!
Personally, I'm already busy with a lot of other things so feel free to contribute if you need to 😉

Pomax · 2016-08-01T17:38:25Z

skipping 4 when 12 is found is a great way to not find characters that are definitely in the font, so filing an issue to make sure all sub tables are checked will be a good idea =)

as for contributing: I run an insane amount of projects already, so writing comments or just talking about how the opentype spec wants things done is a quick and easy job I am happy to do; reviewing code for whether an approach is sound is a bit more work, but typically still doable with a few 15 minutes here or there, but writing code is way more work than I have free time for at the moment =)

fdb · 2016-08-01T17:40:25Z

Hey @Pomax thanks for clearing that up. It sounds it'll be a good idea to keep all of them and do a lookup through them. Do you know if the spec says something about the order in which they should be looked up?

Vildan · 2016-08-01T17:41:55Z

Because there are only format 4 and 12 now, and 12 is superset of 4, there is no need to read format 4 if a font has format 12 in it. And because cmap tables placed in ascending order, we can find format 12 before format 4. @Pomax, do you have an example when we skip characters if read only format 12? I ran this test on 4000+ fonts and didn't find a single font where format 4 gives some extra characters versus format 12

Pomax · 2016-08-01T18:15:04Z

Rereading the spec, you're right; it quite literally says "Please note, that the content of format 12 subtable, needs to be a super set of the content in the format 4 subtable. The format 4 subtable needs to be in the cmap table to enable backward compatibility needs.". I'm curious if the OpenType spec revisions will remove this need for a cmap_4 in the future, but it does indeed fully justify not bothering with reading the subtable 4 format if format 12 is present.

brawer · 2017-05-01T13:07:23Z

Here’s some test cases for cmap subtables; see README for how to run the test suite.

laoshu133 · 2018-07-26T06:15:15Z

We create a font subset online DEMO that compares some of the differences between opentype.js and fonttools subsets, may be helpful.

http://fonter.dancf.com/examples/subset/

mooman219 · 2019-08-01T09:17:47Z

Technically by supporting format 12, you get format 13 for free right?

jdimeo · 2020-01-14T12:34:13Z

I have a TON of PDFs that use 14. Just throwing my vote in for this- I have no idea what it's all about :-)

Connum · 2023-11-23T07:41:57Z

We are meanwhile supporting format 14 (via #581) as well as format/encoding 0 for platform 1 (via #634), which the issue was originally about. The provided example BRUSHSTP.ttf will load fine with the current master.

If anyone could provide a font using format 13, that would be great.

brawer · 2023-11-23T15:46:38Z

If anyone could provide a font using format 13, that would be great.

Added a test case using this font.

Connum · 2023-11-24T11:52:41Z

Format 13 will be supported via #647, which will close this issue. As discussed before, it's not worth the time to support obscure formats that will probably never be encountered in the wild. Anyone providing a real font with an unsupported format is still welcome to open a new issue for that, of course!

fdb mentioned this issue Apr 3, 2015

Can't open my font #104

Closed

bitinn mentioned this issue Jun 25, 2015

opentype.js having problem processing ttf font extracted from ttc #139

Open

Jolg42 mentioned this issue Dec 2, 2017

Make cmap format 12 if needed #315

Merged

Connum self-assigned this Nov 23, 2023

Connum added enhancement Spec Related to the implementation of the Opentype specification labels Nov 23, 2023

Connum mentioned this issue Nov 24, 2023

implement support and tests for cmap format 13 #647

Merged

8 tasks

yne closed this as completed in #647 Nov 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Provide support for all cmap table formats #105

Provide support for all cmap table formats #105

fdb commented Apr 3, 2015

Pomax commented Jun 25, 2015

bitinn commented Jun 25, 2015

Jolg42 commented Jul 29, 2016

fdb commented Jul 31, 2016

Jolg42 commented Jul 31, 2016

Pomax commented Jul 31, 2016 •

edited

Loading

Jolg42 commented Aug 1, 2016

Pomax commented Aug 1, 2016

Jolg42 commented Aug 1, 2016

Pomax commented Aug 1, 2016 •

edited

Loading

fdb commented Aug 1, 2016

Vildan commented Aug 1, 2016

Pomax commented Aug 1, 2016 •

edited

Loading

brawer commented May 1, 2017

laoshu133 commented Jul 26, 2018

mooman219 commented Aug 1, 2019 •

edited

Loading

jdimeo commented Jan 14, 2020

Connum commented Nov 23, 2023

brawer commented Nov 23, 2023

Connum commented Nov 24, 2023

Provide support for all cmap table formats #105

Provide support for all cmap table formats #105

Comments

fdb commented Apr 3, 2015

Pomax commented Jun 25, 2015

bitinn commented Jun 25, 2015

Jolg42 commented Jul 29, 2016

fdb commented Jul 31, 2016

Jolg42 commented Jul 31, 2016

Pomax commented Jul 31, 2016 • edited Loading

Jolg42 commented Aug 1, 2016

Pomax commented Aug 1, 2016

Jolg42 commented Aug 1, 2016

Pomax commented Aug 1, 2016 • edited Loading

fdb commented Aug 1, 2016

Vildan commented Aug 1, 2016

Pomax commented Aug 1, 2016 • edited Loading

brawer commented May 1, 2017

laoshu133 commented Jul 26, 2018

mooman219 commented Aug 1, 2019 • edited Loading

jdimeo commented Jan 14, 2020

Connum commented Nov 23, 2023

brawer commented Nov 23, 2023

Connum commented Nov 24, 2023

Pomax commented Jul 31, 2016 •

edited

Loading

Pomax commented Aug 1, 2016 •

edited

Loading

Pomax commented Aug 1, 2016 •

edited

Loading

mooman219 commented Aug 1, 2019 •

edited

Loading