[csswg-drafts] [css-fonts] Font fallback for (Unicode) decomposable characters is browser-dependent (#10565)

spencer246 has just created a new issue for https://github.com/w3c/csswg-drafts:

== [css-fonts] Font fallback for (Unicode) decomposable characters is browser-dependent ==
It seems that [CSS Font Module](https://www.w3.org/TR/css-fonts-4) does not fully specifies how browsers should select an appropriate font for a grapheme if (1) a grapheme consists of a **single** Unicode codepoint X, (2) X is canonically decomposable into codepoint Y, and (3) the font can render only Y but not X.

Note that the condition that a grapheme consists of a **single** codepoint is important here, because [Section 5.3](https://www.w3.org/TR/css-fonts-4/#cluster-matching) of the spec mandates that if a grapheme was a multiple-codepoint sequence whose NFC normalization is Y, browsers must check whether the font can render Y before they move on to the next font in the `font-family` list.

However, it remains unclear whether the rule in Section 5.3 should be applied as well in the case where a codepoint does not belong to a multi-codepoint grapheme cluster or a Unicode variation sequence. In fact, Chrome and Firefox do not agree on this issue; the two browsers render the following simple HTML+CSS snippet differently.

<https://codepen.io/spencer246/pen/bGPdqdQ>

The above page tries to render `U+F992`, a CJK-Compatibility character which canonically decomposes into `U+6F23` using Noto Sans TC. There are a lot of fonts that cover `U+6F23` but not `U+F992`, and [Noto Sans TC](https://fonts.google.com/noto/specimen/Noto+Sans+TC) is one of such fonts.

<img width="290" alt="image" src="https://github.com/user-attachments/assets/b8b3fd6d-8471-4590-ba87-e6595d654e9d">

In the above figure, the first glyph is `U+F992` and the second is `U+6F23`.

On FireFox, since Noto Sans TC cannot render `U+F992`, it renders it with the next font ([`text-security-circle`](https://www.jsdelivr.com/package/npm/text-security)) in the font stack, which renders any codepoint as a small circle.

On Chrome, however, when the engine notices that Noto Sans TC cannot render `U+F992`, it checks whether it can render the canonically equivalent codepoint `U+6F23`, and thus `U+F992` is rendered as a CJK Ideograph rather than a small circle.

---

1. Is this browser-dependent behavior regarding the font fallback algorithm already-known and admissible?

2-1. If it is, the spec should be explicit about its behavior as to how a font is selected for canonically decomposable Unicode characters.

2-2. If it is not, please consider specifying a desired behavior. In my opinion, FireFox-like behavior is desired to match with the variation sequence case:

> For sequences containing variation selectors, which indicate the precise glyph to be used for a given character, user agents always attempt [installed font fallback](https://www.w3.org/TR/css-fonts-4/#installed-font-fallback) to find the appropriate glyph before using the default glyph of the base character.

To be consistent with the above, a canonically decomposable character (e. g. a CJK Compatibility Ideograph) should be matched against all fonts in the `font-family` list before NFC or NFD is applied to it.

Please view or discuss this issue at https://github.com/w3c/csswg-drafts/issues/10565 using your GitHub account


-- 
Sent via github-notify-ml as configured in https://github.com/w3c/github-notify-ml-config

Received on Friday, 12 July 2024 15:14:21 UTC