Re: [w3ctag/design-reviews] Incremental Font Transfer: Patch Subset (Issue #849) from Garret Rieger on 2023-08-16 (public-webapps-github@w3.org from August 2023)

From: Garret Rieger <notifications@github.com>
Date: Wed, 16 Aug 2023 15:18:16 -0700
To: w3ctag/design-reviews <design-reviews@noreply.github.com>
Cc: Subscribed <subscribed@noreply.github.com>
Message-ID: <w3ctag/design-reviews/issues/849/1681340106@github.com>
So far in this issue discussion around the rationale for using patch subset has centered primarily around CJK, but it’s important to note that the use of IFT is extremely beneficial to many other font use cases. Here’s a few others that I consider to be pretty important:

* Emoji and icon fonts. Similar to CJK these feature large numbers of codepoints where particular usages will only need a very small subset. For emoji fonts segmenting them into independent subsets is difficult due to the extensive use of glyph substitution based on codepoint sequences (eg. for skin tones).
* Variable fonts. Particularly multi axis variable fonts can be prohibitively large. Typical usages will only need a small number of points in the font's full design space. IFT via patch subset can incrementally transfer variable font axis data (in addition to glyph data) allowing for downloading only what’s actually needed, while allowing it to be extended later. This is even more important when combined with fonts that also have large codepoint coverage (eg. CJK, Emoji, Icon) due to the multiplicative effect of the variation data.
* Multi-script font families: most font families have coverage over many scripts. Due to this they are typically too large to deliver in their original format and need to be split into separate subsets one per script selected via unicode-range. However, this approach runs into issues when codepoints are shared between scripts (common for combining codepoints), which can lead to the wrong subset being used by the browser for a codepoint that exists in more than one subset. This leads to incorrect rendering of text. This is a very common problem, that we deal with constantly on the Google Fonts service (eg.  https://github.com/googlefonts/glyphsets/issues/98#issuecomment-1597845513, https://github.com/google/fonts/issues/6542, https://github.com/google/fonts/issues/3579, https://github.com/google/fonts/issues/6245, https://github.com/google/fonts/issues/2392). Unfortunately without something like IFT there isn’t a way to solve these issues without significantly increasing the amount of font bytes we deliver to users.
* Future looking: the font format is being [extended to allow inclusion of more than 64k glyphs](https://github.com/harfbuzz/boring-expansion-spec/blob/main/beyond-64k.md). This is needed for the effective use of pan-unicode font families like the Noto families. IFT will be required to efficiently deliver these. Pan-unicode fonts are important in that they enable rendering support for all scripts/languages in unicode.

Given these issues the assertion that the current state of font loading is acceptable is not true. If you look at the web almanac’s section on [language availability in webfonts](https://almanac.httparchive.org/en/2022/fonts#writing-system-and-languages) you’ll see that scripts other than latin, cyrillic, and greek are significantly underrepresented. To quote:

“Sadly, other writing systems are much less prevalent. For example, Han (Chinese) is the [2nd most used writing system in the world](https://www.worldatlas.com/articles/the-world-s-most-popular-writing-scripts.html) (after Latin), but only supported by 0.2% of web fonts. Arabic is the third most used writing system, but again, only supported by 0.4% of web fonts. The reason that some of these [writing systems are not used as web fonts](https://www.w3.org/TR/PFE-evaluation/#fail-large) is that they are very large due to the sheer number of glyphs they have to support, and the difficulty in subsetting them correctly.”

While range request and the newer IFTB proposal will work well for CJK, Emoji, and Icon fonts they aren’t as viable for the other cases I mentioned (multi-script families, variable fonts, and pan-unicode fonts). For example they won’t work well for Arabic font families (specifically called out in the above quote) due to the extremely complex nature of the fonts.

Patch subset is the only currently existing proposal that enables efficient loading for pretty much all of the problematic font loading cases.

Another thing to note is that long term we are planning on having both patch subset and range request/IFTB be standardized, the rationale being:
* As Myle’s noted: while patch subset is extremely efficient, adoption by less sophisticated font hosters may be more challenging. In these cases using range request/IFTB once available will be an improvement over the current state and is better than not adopting any form of IFT.
* Where font hosters are willing to go through the extra effort to adopt patch-subset it will make significant improvements to font loading for their users and will solve use cases that can’t be solved by range request/IFTB. Note: a high quality open source implementation of patch subset is already [available](https://github.com/w3c/patch-subset-incxfer) and we plan to make plugins available for popular open source http servers.
* Given that a [significant amount of font usage on the web is through large font hosters such as Google Fonts](https://almanac.httparchive.org/en/2022/fonts#fig-4): adoption of patch subset by those services will significantly improve font loading performance and the font rendering experience for a huge number of users.

To answer Mark’s questions:

> * The explainer says 'Changes to the Open Font Format or OpenType specifications are out of scope.' Why? In particular, has anyone investigated whether doing so could address the issues with rendering subsets?

We have recently started investigating a potential replacement for the range request proposal called “binned incremental font transfer”. This involves changes to the font format. While it’s an improvement over the range request proposal it will still not be able to match the performance of patch subset and will struggle with use in cases outside of CJK, Emoji, and Icon fonts. Due to the complex nature of fonts a smart server is pretty much necessary to efficiently transfer all classes of fonts. We do have the ability to change the font format if needed, but the problem isn’t the format but the nature of the fonts themselves.

> * The proposal defines what amounts to a new HTTP extension that's specific to Web fonts. Has it undergone sufficient review by the relevant communities, and is it likely to be deployed?

We have invited experts from the fonts community that participate in the web fonts working group in addition to representation from font hosting providers (Google Fonts and Adobe).

> * Could existing protocol mechanisms have been used without the need for a new HTTP extension?

We’ve recently updated the patch subset specification to utilize the more general purpose [compression dictionary transport](https://github.com/WICG/compression-dictionary-transport) proposal to provide the patching functionality. Beyond that the only other extension proposed by patch subset is the introduction of a new header “font-patch-request” which is necessarily specific to the web font space. In [#119](https://github.com/w3c/IFT/issues/119) I’m currently investigating the potential to place the patch request message into a range request header instead.

> * Is this extension likely to see reasonable adoption across the Web?

Google Fonts which is the [largest font hosting provider on the web](https://almanac.httparchive.org/en/2022/fonts#fig-4) and as such sees significant use across the web is planning to adopt incremental font transfer. I can’t speak for the plans of other font providers, but I suspect they run into similar issues that I described above of which IFT can help solve. Particularly services hosting (or planning to host) CJK fonts. Having this standardized and available in browsers should provide pretty good motivation for adoption by font hosters.

As an example Google Fonts was the first large scale adoption of variable fonts and as a result has [significantly increased variable fonts usage on the web](https://almanac.httparchive.org/en/2022/fonts#fig-27).

> * If new functionality is genuinely necessary, has it been designed in such a way as to allow generic use, so that other use cases can benefit -- thereby increasing deployment incentives?

Hopefully in my comments above I’ve provided sufficient motivation for why this technology is necessary to unlock web font usage for currently underrepresented writing systems.

The problem we’re solving is pretty specific to web fonts so the solution is specific to the space. HTTP range-request solves the more general problem of partially loading resources, but isn’t sufficient for web fonts. As noted above we are using an existing general purpose patching mechanism and only specializing where needed: in the message which describes the partial font subset.


> Yes, I saw that some changes were made, and appreciate the effort. However, from a HTTP perspective this design is still not ready for standardization -- while it meets the needs of its proponents, it's use of HTTP doesn't take into account all aspects of the protocol, and I don't believe it will see good adoption, particularly by CDNs and other parties which would need to make substantial changes to their infrastructure to accommodate it.

We definitely appreciate your feedback so far it has resulted in changes to the specification for the better. I’d definitely like to keep iterating to address any remaining concerns that you have.

-- 
Reply to this email directly or view it on GitHub:
https://github.com/w3ctag/design-reviews/issues/849#issuecomment-1681340106
You are receiving this because you are subscribed to this thread.

Message ID: <w3ctag/design-reviews/issues/849/1681340106@github.com>
Received on Wednesday, 16 August 2023 22:18:24 UTC