You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm debugging a simple test here in preparation to add OS/2 support for browsers. While the browser now doesn't complain anymore about a "missing OS/2" table if using SubsetProfile::Web, it complains about a broken cmap table, because the subsetting always uses MacRoman Encoding (table format 0 - old Apple format), which isn't supported by modern browsers, as they expect the cmap table to be in Windows/Unicode format (format 4 - 12):
// tests/cff.rs#[test]fntest_subset_with_os2_table(){use allsorts::font::Font;use allsorts::tables::cmap::CmapSubtable;use base64::{Engineas _, engine::general_purpose};use allsorts::tables::FontTableProvider;use std::fs::File;use std::io::Write;use std::collections::HashSet;// Test string to use for the font subsetlet test_string = "hello world";// Load the fontlet buffer = read_fixture("tests/fonts/opentype/Klei.otf");let opentype_file = ReadScope::new(&buffer).read::<OpenTypeFont<'_>>().unwrap();let provider = opentype_file.table_provider(0).unwrap();let provider2 = opentype_file.table_provider(0).unwrap();// Create a font instance to access cmaplet font = Font::new(provider).unwrap();// Get the cmap subtable for unicode mappinglet cmap_data = font.cmap_subtable_data();let cmap_subtable = ReadScope::new(cmap_data).read::<CmapSubtable>().unwrap();// Map characters to glyph IDsletmut glyph_ids = vec![0];// Always include glyph 0 (.notdef)for c in test_string.chars(){ifletOk(Some(glyph_id)) = cmap_subtable.map_glyph(c asu32){if !glyph_ids.contains(&glyph_id){
glyph_ids.push(glyph_id);}}}// Sort and deduplicate glyph IDs
glyph_ids.sort();
glyph_ids.dedup();println!("Using glyph IDs: {:?}", glyph_ids);// Subset the fontlet subset_buffer = subset(&provider2,&glyph_ids,&SubsetProfile::Full).unwrap();// Validate that the OS/2 table is present in the subsetted fontlet subset_otf = ReadScope::new(&subset_buffer).read::<OpenTypeFont<'_>>().unwrap();let subset_provider = subset_otf.table_provider(0).unwrap();// Check that OS/2 table existsassert!(
subset_provider.has_table(tag::OS_2),"Subset font is missing the OS/2 table. Use Profile::Web for web compatibility.");// Compare tables in original and subset fontslet original_tables:HashSet<_> = opentype_file
.table_provider(0).unwrap().table_tags().unwrap_or_default().into_iter().collect();let subset_tables:HashSet<_> = subset_provider
.table_tags().unwrap_or_default().into_iter().collect();println!("Original font tables: {:?}", original_tables);println!("Subset font tables: {:?}", subset_tables);
std::fs::write("./Klei.otf",&buffer);
std::fs::write("./Klei-Subset.otf",&subset_buffer);// Output an HTML file with the test string using the subsetted fontlet base64_font = base64::prelude::BASE64_STANDARD.encode(&subset_buffer);let html = format!(r#"<!DOCTYPE html><html><head> <title>Font Subset Test</title> <style> @font-face {{ font-family: 'SubsetFont'; src: url('data:font/otf;base64,{}') format('opentype'); }} .test-text {{ font-family: 'SubsetFont', sans-serif; font-size: 24px; }} .fallback {{ font-family: sans-serif; font-size: 24px; }} </style></head><body> <h1>Font Subset Test</h1> <p>The text below should display in the subsetted font:</p> <p class="test-text">{}</p> <p>This is fallback text:</p> <p class="fallback">{}</p></body></html>"#, base64_font, test_string, test_string);// Write the HTML to a filelet output_path = "./subset_font_test.html";letmut file = File::create(output_path).unwrap();
file.write_all(html.as_bytes()).unwrap();println!("Created {} - open in a browser to verify the font works", output_path);}
In tables/cmap/subset.rs, I found this block:
impl owned::EncodingRecord{pubfnfrom_mappings(mappings:&MappingsToKeep<NewIds>) -> Result<Self,ParseError>{match mappings.plane(){CharExistence::MacRoman => {// The language field must be set to zero for all 'cmap' subtables whose platform// IDs are other than Macintosh (platform ID 1). For 'cmap' subtables whose// platform IDs are Macintosh, set this field to the Macintosh language ID of the// 'cmap' subtable plus one, or to zero if the 'cmap' subtable is not// language-specific. For example, a Mac OS Turkish 'cmap' subtable must set this// field to 18, since the Macintosh language ID for Turkish is 17. A Mac OS Roman// 'cmap' subtable must set this field to 0, since Mac OS Roman is not a// language-specific encoding.//// — https://docs.microsoft.com/en-us/typography/opentype/spec/cmap#use-of-the-language-field-in-cmap-subtablesletmut glyph_id_array = [0;256];for(ch, gid)in mappings.iter(){println!("encoding with MacRoman {ch:?} {gid}");// <------------------let ch_mac = match ch {// NOTE(unwrap): Safe as we verified all chars with `is_macroman` earlierCharacter::Unicode(unicode) => {
usize::from(char_to_macroman(unicode).unwrap())}Character::Symbol(_) => unreachable!("symbol in mac roman"),};// Cast is safe as we determined that all chars are valid in Mac Roman
glyph_id_array[ch_mac] = gid asu8;}let sub_table = owned::CmapSubtable::Format0{language:0,glyph_id_array:Box::new(glyph_id_array),};Ok(owned::EncodingRecord{platform_id:PlatformId::MACINTOSH,encoding_id:EncodingId::MACINTOSH_APPLE_ROMAN,
sub_table,})}CharExistence::BasicMultilingualPlane => {let sub_table = cmap::owned::CmapSubtable::Format4(
owned::CmapSubtableFormat4::from_mappings(mappings)?,);Ok(owned::EncodingRecord{platform_id:PlatformId::UNICODE,encoding_id:EncodingId::UNICODE_BMP,
sub_table,})}CharExistence::AstralPlane => {let sub_table = cmap::owned::CmapSubtable::Format12(
owned::CmapSubtableFormat12::from_mappings(mappings),);Ok(owned::EncodingRecord{platform_id:PlatformId::UNICODE,encoding_id:EncodingId::UNICODE_FULL,
sub_table,})}CharExistence::DivinePlane => {let sub_table = cmap::owned::CmapSubtable::Format4(
owned::CmapSubtableFormat4::from_mappings(mappings)?,);Ok(owned::EncodingRecord{platform_id:PlatformId::WINDOWS,encoding_id:EncodingId::WINDOWS_SYMBOL,
sub_table,})}}}}
and running it prints:
---- test_subset_with_os2_table stdout ----
Using glyph IDs: [0, 1, 69, 70, 73, 77, 80, 83, 88]
encoding with MacRoman Unicode(' ') 1
encoding with MacRoman Unicode('d') 2
encoding with MacRoman Unicode('e') 3
encoding with MacRoman Unicode('h') 4
encoding with MacRoman Unicode('l') 5
encoding with MacRoman Unicode('o') 6
encoding with MacRoman Unicode('r') 7
encoding with MacRoman Unicode('w') 8
Now, this is wrong because it ALWAYS picks MacRoman encoding, which browsers don't support.
I can fix this and get a properly subsetted font without errors, if I merge the code paths for CharExistence::MacRoman and CharExistence::BasicMultilingualPlane.
CC @wezm - can this be merged or do any tools you know of require MacRoman encoding? Chrome simply refuses to load fonts with a Type0 cmap subtable, but it works with Type4.
The text was updated successfully, but these errors were encountered:
fschutt
added a commit
to fschutt/allsorts
that referenced
this issue
Mar 12, 2025
We use/prefer MacRoman wherever possible in Prince and thus embedding in PDFs so I don't think it can be merged with CharExistence::BasicMultilingualPlane. The reason is that in a PDF, if we can use MacRoman text in the PDF can use an 8-bit encoding. If the font is embedded with one of the Unicode cmaps then the the text in the PDF requires 16-bits per character, which can significantly increase the size of a PDF with a lot of text.
We could use the SubsetProfile enum you introduced in the PR to drive whether MacRoman is used or not. It would also be good include a link to the code in Chrome or the Chrome font sanitizer that rejects fonts with only a MacRoman cmap, so that we can point to that in the code.
I'm debugging a simple test here in preparation to add OS/2 support for browsers. While the browser now doesn't complain anymore about a "missing OS/2" table if using
SubsetProfile::Web
, it complains about a broken cmap table, because the subsetting always usesMacRoman
Encoding (table format 0 - old Apple format), which isn't supported by modern browsers, as they expect the cmap table to be in Windows/Unicode format (format 4 - 12):In tables/cmap/subset.rs, I found this block:
and running it prints:
Now, this is wrong because it ALWAYS picks MacRoman encoding, which browsers don't support.
I can fix this and get a properly subsetted font without errors, if I merge the code paths for
CharExistence::MacRoman
andCharExistence::BasicMultilingualPlane
.CC @wezm - can this be merged or do any tools you know of require MacRoman encoding? Chrome simply refuses to load fonts with a Type0 cmap subtable, but it works with Type4.
The text was updated successfully, but these errors were encountered: