Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support multi-character emoji #511

Closed
forresto opened this issue May 2, 2022 · 7 comments · Fixed by #688
Closed

Support multi-character emoji #511

forresto opened this issue May 2, 2022 · 7 comments · Fixed by #688

Comments

@forresto
Copy link
Contributor

forresto commented May 2, 2022

This issue with monochrome Noto Emoji is distinct from the color emoji issue (#193).

#338 added support for non-Basic-Multilingual-Plane (BMP) characters, but uses Array.from, which doesn't account for combined emoji.

It seems that Opentype.js has the glyph information needed, but the initial text-to-glyph translation is the issue:

image
https://opentype.js.org/glyph-inspector.html

Expected Behavior

Calling notoEmojiFont.draw(context, "👨‍👩‍👧‍👦") should render
image

Current Behavior

Calling notoEmojiFont.draw(context, "👨‍👩‍👧‍👦") renders
image

Possible Solution

  1. If "ccmp" is not supported yet and would cover this, this issue can be closed as a duplicate of Consider adding support to more GSUB tags? #443.

  2. Intl.Segmenter is a native solution, but isn't supported by Firefox yet.

const splitSegmentArray = (string) => Array.from(new Intl.Segmenter().segment(string)).map(x => x.segment);
console.log(splitSegmentArray("😅👨‍👩‍👧‍👦💖👩‍💻💔👩‍🌾🧡👨🏽‍🌾💜🖖🏾🌈"))
  1. graphemer is a library-based solution. (It is a fairly big library.)

  2. twemoji-parser is focused on parsing emoji sequences, so it's smaller than graphemer.

Steps to Reproduce (for bugs)

Live demo: https://gm69qn.csb.app

  1. Call notoEmojiFont.stringToGlyphs("👨‍👩‍👧‍👦") and get glyphs for "👨👩👧👦" interspersed with the combiner ("uni200D") instead of the one glyph for the combined family.

image

  1. Same for other combined emoji, like 👩‍💻, 👩‍🌾, 👨🏽‍🌾, 🖖🏾

Context

We're adding support for emoji to Cuttle CAD, which can render various fonts as vectors for laser cutting, etc.

Your Environment

  • Version used: 1.3.4
  • Font used: Noto Emoji (ttf)
  • Browser Name and version: Various tested
  • Operating System and version (desktop or mobile): Mac OS desktop
  • Link to your project: https://gm69qn.csb.app
@forresto
Copy link
Contributor Author

forresto commented May 3, 2022

It seems like font.tables.gsub has the ligatureSets info needed to combine these. Is that something that I can enable with an option?
image

notoEmojiFont.substitution.getFeature("ccmp") // Array(3640)

The feature tag is "ccmp" ... I'm not seeing that called with defaults via getFeature or getMultiple, though there are some tests. 🤔

If "ccmp" is not supported yet, this can be closed as a duplicate of #443.

@forresto
Copy link
Contributor Author

forresto commented May 9, 2022

Looking at #443 I thought this was worth a try:

notoEmojiFont.substitution.add(
  "ccmp", 
  notoEmojiFont.substitution.getFeature('ccmp')
);

but got:

Error: Ligature: unable to modify coverage table format 2

@forresto
Copy link
Contributor Author

In addition to the ccmp substitutions, https://en.wikipedia.org/wiki/Variation_Selectors_(Unicode_block) need to be taken into account. For example, "☠" vs "☠️".

@jamesjoung
Copy link

Im also looking for a workaround for this.
It would be nice to support it or have workaround?

@forresto
Copy link
Contributor Author

forresto commented May 16, 2022

Here's my workaround.

// Opentype.js doesn't actually support these substitutions, so we'll have to
// search them manually
const substitutions = font.substitution.getFeature("ccmp");

function emojiToGlyph (emojiString) {

const glyphs = font
  .stringToGlyphs(emojiString)
  // Discarding these makes the substitution search work for emoji sequences
  // with variation selectors
  // https://en.wikipedia.org/wiki/Variation_Selectors_(Unicode_block)
  .filter((glyph) => glyph.index <= 1850);
let glyph;
if (glyphs.length === 1) {
  glyph = glyphs[0];
} else if (glyphs.length > 1) {
  const indexes = glyphs.map((glyph) => glyph.index);
  const sub = substitutions.find((substitution) => equals(substitution.sub, indexes));
  if (sub) {
    glyph = font.glyphs.get(sub.by);
  }
}
if (glyph) {
  return glyph;
} else {
  throw new Error(`${emojiString} - couldn't find a glyph :(`);
}

}

emojiToGlyph("👨‍👩‍👧‍👦");
/** Custom equals function that can also check lists. */
function equals(a, b) {
  if (a === b) {
    return true;
  } else if (Array.isArray(a) && Array.isArray(b)) {
    if (a.length !== b.length) {
      return false;
    }
    for (let i = 0; i < a.length; i += 1) {
      if (!equals(a[i], b[i])) {
        return false;
      }
    }
    return true;
  } else {
    return false;
  }
}

Caveats:

This only works for one emoji. To replace the glyphs in an arbitrary string, we would also need tokenizer logic.

Only tested with Noto Emoji.

image

@ILOVEPIE
Copy link
Contributor

ILOVEPIE commented Nov 20, 2022

here's the different options: https://medium.com/making-faces-and-other-emoji/emoji-fonts-technically-40f3fdc0869e
I'd recommend at least supporting COLR/CPAL as it's probably the most widely supported one and one of the most implemented in fonts. It would also probably be a good idea to implement CBDT/CBLC support as well.

@TonyJR
Copy link
Contributor

TonyJR commented Mar 18, 2024

ccmp looks like an enforcement feature. It's not display in feature list, but always runs before decode a text.
image
https://learn.microsoft.com/en-us/typography/script-development/standard
Maybe we can add a preprocessing process in Font.stringToGlyphs() ?

@TonyJR TonyJR mentioned this issue Mar 22, 2024
8 tasks
@Connum Connum linked a pull request Apr 10, 2024 that will close this issue
8 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants