[csswg-drafts] [css-fonts-4] Suggestion: Support Unicode Character Sequences in unicode-range (#10651)

brianjlacy has just created a new issue for https://github.com/w3c/csswg-drafts:

== [css-fonts-4] Suggestion: Support Unicode Character Sequences in unicode-range ==
# Request

I propose that the `unicode-range` descriptor in the @font-face rule should support matching specific Unicode character sequences. This would allow more precise control over which characters get rendered by a particular font, especially useful for cases where emoji sequences and text characters need to be handled differently.

## Background and Rationale

Currently, unicode-range only supports specifying individual code points or ranges, which can be limiting. For example, emoji fonts may include characters like digits in sequences (e.g., keycap emojis like 1️⃣), but not as standalone glyphs. This can cause issues when these fonts are used alongside text fonts, as digits might not render correctly if the emoji font doesn't include visible standalone versions.


### Consider this scenario:

```html
<div id="container">
  <p class="emojified">
    This paragraph should use a custom font, "Nifty Emoji," to render standard emojis like 😁 and also emoji sequences like 1️⃣, while letting numbers (1 2 3) and punctuation (*, #, etc.) use a fallback font.
  </p>
</div>
```
```css
@font-face {
  font-family: "Nifty Emoji";
  src: url("path/to/nifty-emoji.woff2") format("woff2");
  unicode-range: U+0023, U+002A, U+0030-0039, U+FE0F, /* ...etc. */;
}


p.emojified {
  font-family: 'Nifty Emoji', sans-serif;
}
```

In this example, the `unicode-range` includes `U+0030-0039`, which covers digits zero through nine. If "Nifty Emoji" doesn't have visible glyphs for these digits, they might not render properly, causing display issues.

*But wait...*

Yes, we could simply exclude these characters from the unicode-range. But now _sequences_ that depend on them -- such as, in this case, the "keycap" emojis -- fall back as well. **There is no way to treat SEQUENCES differently from single characters.**

## Proposed Syntax and Examples

I propose a modification to the `unicode-range` syntax in which the `+` may be used to match unicode characters _only_ when they appear within a specified _sequence_:

1. Individual Sequences:
   ```css
    @font-face {
      font-family: "Nifty Emoji";
      src: url("path/to/nifty-emoji.woff2") format("woff2");
      /*
        Supports "text" and "emoji" style "keycap" symbols;
        but ordinary digits (0030-39) are allowed to fall back!
      */
      unicode-range: U+0023+FE0E, U+0023+FE0F, /* ... */;
    }
    ```
    
    
2. Sequence Ranges:
   ```css
    @font-face {
      font-family: "Nifty Emoji";
      src: url("path/to/nifty-emoji.woff2") format("woff2");
      /*
        Supports:
        - "text" and "emoji" style "keycap" symbols
        - "text" and "emoji" style "keycap" style '#' symbol
        - the handshake emojis in various skin tones
        
        Does NOT Support (allows to fall back to another font):
        - ordinary cardinal numbers '0' through '9'
        - ordinary '#' symbol
      */
      unicode-range: U+0030-0039+FE0F, U+0023+FE0E-F, U+1FAF1+1F3FB-F+200D+1FAF2+1F3FB-F, /* ... */;
    }
    ```

*Keycap Emojis:* The sequence U+0030-0039+FE0F covers the keycap emojis from 0️⃣ to 9️⃣, ensuring these sequences are displayed using the "Nifty Emoji" font, while standalone digits (U+0030-0039 without FE0F) can fall back to a standard text font.

*Emoji Variants:* Using +FE0F and +FE0E allows specifying emoji or text presentation styles, respectively. For example, U+0030+FE0F for emoji-style zero and U+0030+FE0E for text-style zero.

*Complex Sequences:* The sequence U+1FAF1+1F3FB-F+200D+1FAF2+1F3FB-F covers various skin tone combinations for the handshake emoji. This ensures that specific combinations are rendered correctly according to the emoji font's design. (Note the use here of the ZWJ.)

## Considerations
I believe this approach is intuitive; using the `+` symbol to indicate sequences correlates with a common form seen elsewhere on the web when describing such sequences, including in [official unicode documents](https://www.unicode.org/emoji/charts/emoji-variants.html). It naturally extends to complex sequences, such as those involving ZWJ, and accommodates ranges within sequences.

It should be noted that the font-variant-emoji descriptor does not address the issue I'm hoping to address, as that must be applied at the element level. The approach I'm proposing allows for fine-grained control of how individual characters are rendered using a particular font.

# Conclusion
Supporting Unicode sequences in unicode-range offers a precise, flexible way to manage font rendering, particularly for mixed content scenarios involving text and emojis. This enhancement would improve control over font usage, prevent rendering issues, and enrich the developer experience by allowing for finer typography control.

As I am, again, not an expert here, I welcome critique and alternate viewpoints.

Please view or discuss this issue at https://github.com/w3c/csswg-drafts/issues/10651 using your GitHub account


-- 
Sent via github-notify-ml as configured in https://github.com/w3c/github-notify-ml-config

Received on Wednesday, 31 July 2024 21:26:29 UTC