[csswg-drafts] [css-fonts] Handling of Standardized Variation Sequences

hfhchan has just created a new issue for https://github.com/w3c/csswg-drafts:

== [css-fonts] Handling of Standardized Variation Sequences ==
In Step 2a of Section 5.3 Cluster matching of CSS Fonts 3:
> If c1 is a variation selector, system fallback must be used to find
> a font that supports the full sequence of `b + c1.

I've tested Chrome and Firefox, and both don't do any system font fallback when the given font contains a glyph for the `b` but don't for `b + c1`.

I tested with `齋󠄁齋` (the first has a variation selector U+E0101 appended after it), and with a CSS declaration of `font-family: SimSun, HanaMinA;`, of which SimSun does not contain a glyph for the variation selector, but HanaMinA does:

Result in Firefox:
![image](https://user-images.githubusercontent.com/8191296/29164096-ffb2e88a-7df0-11e7-8e6a-46fdf56e8585.png)

Result in Chrome:
![image](https://user-images.githubusercontent.com/8191296/29164091-fd4bc7ba-7df0-11e7-827a-baa936f66270.png)

The spec should either be amended to reflect the behavior implemented by browsers, or the browser's behavior should be changed.  Unicode Variation Selectors involve a GSUB CMAP14 lookup and it would be understandable that reordering complex table lookups in the font-selection phase could be prohibitively expensive.

Due to the nature of the Han script, it is often hard to objectively quantify what is the same character and what is not.  Different people have different expectations.  CJK Unification in ISO10646 was a very controversial decision and continues on to be controversial today.  Reliably rendering Unicode Variations is often necessary and may have legal ramifications.

The fallback to `b` behavior is problematic because it may not be what the author intended and the user has completely no idea.  More often, the preferred behavior is that a "tofu"  (.nodef) is displayed instead.

In addition, China and TCA are likely to be using Unicode Variation Selectors to encode historic variants of CJK Unified Ideographs (assuming the decision by the IRG is approved by WG2 in the coming meeting in September).  Variant characters with visually-significant differences will be approved for unification with their more common character, provided that the variant is similar in structure, rare in modern use and is attested to be exactly equivalent to the base character in semantics.  In these cases, getting .nodef is usually preferred over getting the base character's glyph if a given font doesn't have that specific glyph variant.

At the same time, it may be useful in historical text digitization projects to dynamically switch between showing characters in glyphs as they are in the books, and glyphs that are used in the modern day.  This could be accomplished by stripping all the variation selectors out via regex on innerHTML, or more preferrably activated via a similar CSS property or feature / flag.

To cater for such behavior, I suggest that a new CSS property and/or OpenType feature / flag is introduced.

These behaviors could be implemented via a new CSS property such as `font-variation-sequences` with values `auto` (fallback for VS-16 and below if `b + c1` is missing, tofu for VS-17 and above if `b + c1` is missing), `ignore-missing` (always use `b` if `b + c1` is missing), `tofu-missing` (fallback to .nodef / CID0),  `ignore-all` (always use `b` if `b + c1` regardless of setting).

It could also be piggybacked by introducing a new OpenType flag, maybe named as "tofu", so the different behaviors could be activated directly via `font-variation-settings`.


Please view or discuss this issue at https://github.com/w3c/csswg-drafts/issues/1710 using your GitHub account

Received on Thursday, 10 August 2017 09:56:04 UTC