Re: User scenarios which would benefit from streamable fonts

Some more:

The percentage of the file size that is taken by outlines (CFF, glyf, loca)







> On Feb 6, 2019, at 3:46 PM, Myles C. Maxfield <mmaxfield@apple.com> wrote:
> 
> While I was looking at the preinstalled fonts in macOS, I gathered some interesting data:
> 
> You can see a pretty clear division between CJK fonts and non-CJK:
> <Screen Shot 2019-02-06 at 3.38.01 PM.png>
> 
> The “Shaping ratio” is what percent of the file size is taken up by shaping tables (GPOS, GSUB, morx)
> <Screen Shot 2019-02-06 at 3.39.18 PM.png>
> 
> Each dot is a font that’s preinstalled on macOS:
> <Screen Shot 2019-02-06 at 3.37.47 PM.png>
> 
> <Screen Shot 2019-02-06 at 3.37.54 PM.png>
> 
> This last graph I think is the most interesting. It shows that there aren’t many fonts with both many characters and lots of shaping rules. Given the bimodality, there may actually be two solutions here: One for CJK fonts that have tons of characters, and one for alphabetic scripts that have more shaping.
> 
> Food for thought,
> Myles
> 
>> On Feb 6, 2019, at 3:18 PM, Myles C. Maxfield <mmaxfield@apple.com <mailto:mmaxfield@apple.com>> wrote:
>> 
>> This is wonderful, thank you!
>> 
>> Does anyone else have any other responses, or should we get started using this right away?
>> 
>> More replies inline.
>> 
>>> On Feb 6, 2019, at 1:41 PM, Garret Rieger <grieger@google.com <mailto:grieger@google.com>> wrote:
>>> 
>>> Unfortunately I'm OOO tomorrow and Rod may not be able to attend tomorrows meeting either so we put together some answers to the questions that Myle's posed:
>>> 
>>> 1) Yes [to all] :). In general we’d like to be able to get font data for content that is added later on demand. This is also a great example of where the server might in time learn to over-deliver (if they ask for abc they tend to want d so lets just always send that too).
>>> 
>>> 2) If we think of this as a spectrum, from least to most impact:
>>> - Least impact, pages using simple latin, no diacritics. Think maps of North America, etc.
>>> - Pages using more interesting latin, diacritics, etc. Think Eastern Europe, Vietnamese, etc.
>>> - Scripts with complex shaping, Arabic, Indic, etc. Currently you basically have to deliver the whole script in one blob.
>> 
>> Just from looking at the fonts that come preinstalled with macOS, it seems the Arabic and Indic fonts spend around ~20% of their file size inside GPOS, GSUB, or morx. Noto Nastaliq Urdu is an outlier, which spends 69% of its file size on GPOS & GSUB.
>> 
>> Given the nature of how the shaping rules are designed, I’m not sure how much of a help any streamable solution would be for a font like Noto Nastaliq Urdu. However, 20% on the other Arabic and Indic fonts seems small enough that improvements elsewhere could still have a significant effect overall.
>> 
>>> - CJK fonts. Painful to deliver today, easy with enrichment. Our measurements suggest most pages use mostly common chars plus a few rarer ones. http://unicodeconference.org/presentations-42/S5T3-Sheeter.pdf <http://unicodeconference.org/presentations-42/S5T3-Sheeter.pdf> has a bit more detail.
>> 
>> This seems to indicate that unicode-range does a pretty good job for these fonts. The slides you linked to show significant improvement using segmentation with existing browser infrastructure. I’m interested to see how much better than unicode-range we can do here.
>> 
>>> - Most impact, a pan-unicode-font like Noto. Currently extremely difficult to serve as a single font due to size and attempting to serve via unicode range cause all kinds of problems when the same codepoints is used in multiple scripts (eg. indic).
>> 
>> Out of curiosity, why was Noto designed to handle so much in a single file? You just listed a few downsides to the design; what are the upsides?
>> 
>>> 
>>> Looking ahead, Variable Fonts with lots of axes might benefit from intelligent stripping of unused axes or parts of axes. We do not yet have much data on this but we already see that size is materially larger when you have many masters or axes.
>>> 
>>> 3) See #2 :). 
>>> 
>>> For popularity rankings https://fonts.google.com/?sort=popularity <https://fonts.google.com/?sort=popularity> shows popularity.
>> 
>> Cool!
>> 
>>> However, things like CJK are recent additions and thus haven't had time to "catch up" to the long-standing leaders.
>>> 
>>> We know from shipping unicode-range (https://developers.googleblog.com/2015/02/smaller-fonts-with-woff-20-and-unicode.html <https://developers.googleblog.com/2015/02/smaller-fonts-with-woff-20-and-unicode.html>) that avoiding sending chunks of the font saves a *lot* of bytes across almost all fonts with support for more than rudimentary latin. We can likely figure out ways to extract more data around this.
>>> 
>>> 4) We'd like to work with browsers to come up with a way to measure impact on latency, particularly reducing time spent blocked on the font (all other assets ready). Maybe a cooperative experiment between browser & Google Fonts? We'd like to prove the new solution beats unicode-range on latency alone, even without a penalty for breaking shaping. For a new solution breaking shaping shouldn't occur.
>> 
>> Yes, definitely. Raw byte counts probably aren’t the best measure here, as the number of round-trips to the server can dramatically affect latency. Measurements gathered from the browser would be a much better indicator of how we’re doing.
>> 
>>> 
>>> We would very much like to have a set of sample browse sequences, preferably real (but anonymized) to use to compare solutions. Initially we'd imagine a simple comparison that just computes transfer sizes offline, and then ones that look promising could advance to more sophisticated testing.
>> 
>> Historically, the WebKit team has gathered this kind of data ourselves by visiting a collection of pages which are supposed to be representative of the Web. Tracking users to gather this kind of data has privacy implications.
>> 
>>> 
>>> This project is only a success for us if it makes the web faster and we're looking forward to working with y'all to figure out a really strong series of benchmarks and eventually live tests to prove this is true.
>> 
>> 🎉🎉🎉
>> 
>>> 
>>> On Sat, Feb 2, 2019 at 11:22 AM Ken Lunde <lunde@adobe.com <mailto:lunde@adobe.com>> wrote:
>>> Myles,
>>> 
>>> I was referring to genuine Pan-CJK fonts that make extensive use of the 'locl' GSUB feature to access region-specific forms of ideographs and punctuation. Typical East Asian fonts that are meant to serve a single region require far less feature interaction. In fact, many such fonts have none, besides the 'vert' GSUB feature for vertical forms.
>>> 
>>> Regards...
>>> 
>>> -- Ken
>>> 
>>> > On Feb 1, 2019, at 2:43 PM, Myles C. Maxfield <mmaxfield@apple.com <mailto:mmaxfield@apple.com>> wrote:
>>> > 
>>> > 
>>> > 
>>> >> On Jan 31, 2019, at 7:54 PM, Ken Lunde <lunde@adobe.com <mailto:lunde@adobe.com>> wrote:
>>> >> 
>>> >> Myles,
>>> >> 
>>> >> I should point out that the assumption of no feature interaction in typical CJK fonts becomes an instant non-starter for Pan-CJK fonts that make extensive use of the 'locl' GSUB feature to access non-default glyphs. The Source Han and Noto CJK fonts serve as excellent testing fodder for this. I should also mention that Adobe Fonts' (formerly Typekit) dynamic augmentation preserves the 'locl' GSUB feature functionality, which means that it is possible.
>>> > 
>>> > Oh, when I said “fonts with many independent glyphs, like a Chinese font” I meant “independent” w/r/t context-sensitive shaping, like an Arabic or Indic font. Features definitely interact in CJK fonts.
>>> > 
>>> > Unless I’m misunderstanding what you mean?
>>> > 
>>> >> 
>>> >> Regards...
>>> >> 
>>> >> -- Ken
>>> >> 
>>> >>> On Jan 31, 2019, at 3:22 PM, Myles C. Maxfield <mmaxfield@apple.com <mailto:mmaxfield@apple.com>> wrote:
>>> >>> 
>>> >>> Hello, everyone!
>>> >>> 
>>> >>> In order to determine which strategy we should pursue for a streaming font interface, we should first determine which situations we are trying to improve. Once we have determined the specific scenarios that we are trying to attack, we can then create a benchmark to see how bad we are right now and to judge the various proposals.
>>> >>> 
>>> >>> The document from Google sent a few days ago describes "Minimize latency for client to view webfont styled content.” I’m hoping we, as a group, can go further than this and describe:
>>> >>> 
>>> >>> 1) Are we concerned with just first page load? Or are we concerned with interactions users make with pages? Are we concerned with “infinite scrolling” pages?
>>> >>> 
>>> >>> 2) Which types of webpages have big problems? Is there any way to characterize the types of sites that should see an improvement?
>>> >>> 
>>> >>> 3) Which types of fonts most need improvement in their loading experience? Fonts with many independent glyphs, like a Chinese font? Fonts with complex shaping rules? Fonts with complicated outlines?
>>> >>>   => The Google Fonts corpus could provide some big insights here. Which fonts are the ones that require big downloads but have much of the file unused by the browser? Can such fonts be characterized? In general, which fonts are the most popular?
>>> >>> 
>>> >>> 4) Regarding comparison against the existing unicode-range solution, should we try to make a cost function that includes both breaks in shaping and latency? Or should we consider that a break in shaping should be forbidden? Should we try to incorporate how many text flashes occur during each user interaction?
>>> >>> 
>>> >>> Figuring out the answers to questions like these will help us better be able to weigh each possible solution. I’d love to hear everyone’s thoughts about these sorts of things.
>>> >>> 
>>> >>> Thanks,
>>> >>> Myles
>>> >> 
>>> > 
>>> 
>> 
> 

Received on Thursday, 7 February 2019 00:14:35 UTC