Re: User scenarios which would benefit from streamable fonts from Garret Rieger on 2019-02-28 (public-webfonts-wg@w3.org from February 2019)

From: Garret Rieger <grieger@google.com>
Date: Wed, 27 Feb 2019 17:48:03 -0800
To: "Myles C. Maxfield" <mmaxfield@apple.com>
Cc: "Levantovsky, Vladimir" <Vladimir.Levantovsky@monotype.com>, Ken Lunde <lunde@adobe.com>, "w3c-webfonts-wg (public-webfonts-wg@w3.org)" <public-webfonts-wg@w3.org>
Message-ID: <CAM=OCWZxvhfJEkygLFmRu54Z0Bk4Tcgqwx_-ovG6N5D_sjMqKw@mail.gmail.com>
I was chatting with Dominik from the Chrome team and he mentioned another
possible use case that was pretty interesting. He thought that there may be
some value on the first load of a page in having the browser first send an
enrichment request for only the content that is above the fold followed by
a second request to get everything else. The hope here is that will help
reduce the time to first paint.

On Thu, Feb 14, 2019 at 12:59 AM Myles C. Maxfield <mmaxfield@apple.com>
wrote:

> On Feb 8, 2019, at 8:08 AM, Levantovsky, Vladimir <
> Vladimir.Levantovsky@monotype.com> wrote:
>
> Thank you Myles!
> See my comment inline in the end of your message.
>
> *From:* mmaxfield@apple.com <mmaxfield@apple.com>
> *Sent:* Wednesday, February 6, 2019 6:46 PM
> *To:* Garret Rieger <grieger@google.com>
> *Cc:* Ken Lunde <lunde@adobe.com>; w3c-webfonts-wg (
> public-webfonts-wg@w3.org) <public-webfonts-wg@w3.org>
> *Subject:* Re: User scenarios which would benefit from streamable fonts
>
> While I was looking at the preinstalled fonts in macOS, I gathered some
> interesting data:
>
> You can see a pretty clear division between CJK fonts and non-CJK:
>
> <image001.png>
>
>
> The “Shaping ratio” is what percent of the file size is taken up by
> shaping tables (GPOS, GSUB, morx)
>
> <image002.png>
>
>
> Each dot is a font that’s preinstalled on macOS:
>
> <image003.png>
>
> <image004.png>
>
>
> This last graph I think is the most interesting. It shows that there
> aren’t many fonts with both many characters and lots of shaping rules.
> Given the bimodality, there may actually be two solutions here: One for CJK
> fonts that have tons of characters, and one for alphabetic scripts that
> have more shaping.
>
> <VL>
> If you normalize the “Character Count” axis of the last two graphs, and
> look at the combined results as a 3D graph, you’d notice that the vast
> majority of fonts that have significant portion of the data taken up by
> shaping rules occupy the “small file size” slice of that 3D space. Fonts
> that support multiple scripts sparsely populated the mid-section of the
> cube, and mid- to high-character count fonts form a “larger font file size
> cloud” aligned with the count / shaping plane.
>
>
> Here’s a similar graph, but with the X axis being raw file size and the Y
> axis is the percentage of the font spent. Each font is represented by a
> blue dot and a green dot, and blue shows the percentage of the font taken
> up by outlines and green shows the percentage of the font taken up by
> shaping. I think it shows that you are right.
>
>
> This chart seems to show that we should mostly be pursuing outlines.
>
>
> Given a relatively small file sizes for fonts with high-shaping-rule
> ratio, considering a dedicated solution for fonts supporting complex
> shaping rules may not yield as much ROI as we’d want / hope for, and,
> alternatively, focusing on mid- to high-character count solution that is
> also shaping-friendly would probably bring the most bang for a buck.
> <VL/>
>
> Food for thought,
> <VL> Definitely! <VL/>
>
> Myles
>
>
> On Feb 6, 2019, at 3:18 PM, Myles C. Maxfield <mmaxfield@apple.com> wrote:
>
> This is wonderful, thank you!
>
> Does anyone else have any other responses, or should we get started using
> this right away?
>
> More replies inline.
>
>
> On Feb 6, 2019, at 1:41 PM, Garret Rieger <grieger@google.com> wrote:
>
> Unfortunately I'm OOO tomorrow and Rod may not be able to attend tomorrows
> meeting either so we put together some answers to the questions that Myle's
> posed:
>
> 1) Yes [to all] :). In general we’d like to be able to get font data for
> content that is added later on demand. This is also a great example of
> where the server might in time learn to over-deliver (if they ask for abc
> they tend to want d so lets just always send that too).
>
> 2) If we think of this as a spectrum, from least to most impact:
> - Least impact, pages using simple latin, no diacritics. Think maps of
> North America, etc.
> - Pages using more interesting latin, diacritics, etc. Think Eastern
> Europe, Vietnamese, etc.
> - Scripts with complex shaping, Arabic, Indic, etc. Currently you
> basically have to deliver the whole script in one blob.
>
>
> Just from looking at the fonts that come preinstalled with macOS, it seems
> the Arabic and Indic fonts spend around ~20% of their file size inside
> GPOS, GSUB, or morx. Noto Nastaliq Urdu is an outlier, which spends 69% of
> its file size on GPOS & GSUB.
>
> Given the nature of how the shaping rules are designed, I’m not sure how
> much of a help any streamable solution would be for a font like Noto
> Nastaliq Urdu. However, 20% on the other Arabic and Indic fonts seems small
> enough that improvements elsewhere could still have a significant effect
> overall.
>
>
> - CJK fonts. Painful to deliver today, easy with enrichment. Our
> measurements suggest most pages use mostly common chars plus a few rarer
> ones. http://unicodeconference.org/presentations-42/S5T3-Sheeter.pdf
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__unicodeconference.org_presentations-2D42_S5T3-2DSheeter.pdf&d=DwMFaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=jb2T9D8Np5j0t1X2JtGDVMxJyD5fvLoEPxzRs46vOK4UfGfOrlVsyuleed6YRZk5&m=KndbqnyNnVMSed1am7VL-aKEqlzswLRoRpsCG5XS8-c&s=frGxlJULofnR-BE2yctno54FaMmYOczrsN9swWtQVAY&e=>
>  has a bit more detail.
>
>
> This seems to indicate that unicode-range does a pretty good job for these
> fonts. The slides you linked to show significant improvement using
> segmentation with existing browser infrastructure. I’m interested to see
> how much better than unicode-range we can do here.
>
>
> - Most impact, a pan-unicode-font like Noto. Currently extremely difficult
> to serve as a single font due to size and attempting to serve via unicode
> range cause all kinds of problems when the same codepoints is used in
> multiple scripts (eg. indic).
>
>
> Out of curiosity, why was Noto designed to handle so much in a single
> file? You just listed a few downsides to the design; what are the upsides?
>
>
>
> Looking ahead, Variable Fonts with lots of axes might benefit from
> intelligent stripping of unused axes or parts of axes. We do not yet have
> much data on this but we already see that size is materially larger when
> you have many masters or axes.
>
> 3) See #2 :).
>
> For popularity rankings https://fonts.google.com/?sort=popularity shows
> popularity.
>
>
> Cool!
>
>
> However, things like CJK are recent additions and thus haven't had time to
> "catch up" to the long-standing leaders.
>
>
> We know from shipping unicode-range (
> https://developers.googleblog.com/2015/02/smaller-fonts-with-woff-20-and-unicode.html
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__developers.googleblog.com_2015_02_smaller-2Dfonts-2Dwith-2Dwoff-2D20-2Dand-2Dunicode.html&d=DwMFaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=jb2T9D8Np5j0t1X2JtGDVMxJyD5fvLoEPxzRs46vOK4UfGfOrlVsyuleed6YRZk5&m=KndbqnyNnVMSed1am7VL-aKEqlzswLRoRpsCG5XS8-c&s=mo269ds0WvcEFtUNEYFEvAjfcUj6KUr0mHWnsuiVvtw&e=>)
> that avoiding sending chunks of the font saves a *lot* of bytes across
> almost all fonts with support for more than rudimentary latin. We can
> likely figure out ways to extract more data around this.
>
> 4) We'd like to work with browsers to come up with a way to measure impact
> on latency, particularly reducing time spent blocked on the font (all other
> assets ready). Maybe a cooperative experiment between browser & Google
> Fonts? We'd like to prove the new solution beats unicode-range on latency
> alone, even without a penalty for breaking shaping. For a new solution
> breaking shaping shouldn't occur.
>
>
> Yes, definitely. Raw byte counts probably aren’t the best measure here, as
> the number of round-trips to the server can dramatically affect latency.
> Measurements gathered from the browser would be a much better indicator of
> how we’re doing.
>
>
>
> We would very much like to have a set of sample browse sequences,
> preferably real (but anonymized) to use to compare solutions. Initially
> we'd imagine a simple comparison that just computes transfer sizes offline,
> and then ones that look promising could advance to more sophisticated
> testing.
>
>
> Historically, the WebKit team has gathered this kind of data ourselves by
> visiting a collection of pages which are supposed to be representative of
> the Web. Tracking users to gather this kind of data has privacy
> implications.
>
>
>
> This project is only a success for us if it makes the web faster and we're
> looking forward to working with y'all to figure out a really strong series
> of benchmarks and eventually live tests to prove this is true.
>
>
> 🎉🎉🎉
>
>
>
> On Sat, Feb 2, 2019 at 11:22 AM Ken Lunde <lunde@adobe.com> wrote:
>
> Myles,
>
> I was referring to genuine Pan-CJK fonts that make extensive use of the
> 'locl' GSUB feature to access region-specific forms of ideographs and
> punctuation. Typical East Asian fonts that are meant to serve a single
> region require far less feature interaction. In fact, many such fonts have
> none, besides the 'vert' GSUB feature for vertical forms.
>
> Regards...
>
> -- Ken
>
> > On Feb 1, 2019, at 2:43 PM, Myles C. Maxfield <mmaxfield@apple.com>
> wrote:
> >
> >
> >
> >> On Jan 31, 2019, at 7:54 PM, Ken Lunde <lunde@adobe.com> wrote:
> >>
> >> Myles,
> >>
> >> I should point out that the assumption of no feature interaction in
> typical CJK fonts becomes an instant non-starter for Pan-CJK fonts that
> make extensive use of the 'locl' GSUB feature to access non-default glyphs.
> The Source Han and Noto CJK fonts serve as excellent testing fodder for
> this. I should also mention that Adobe Fonts' (formerly Typekit) dynamic
> augmentation preserves the 'locl' GSUB feature functionality, which means
> that it is possible.
> >
> > Oh, when I said “fonts with many independent glyphs, like a Chinese
> font” I meant “independent” w/r/t context-sensitive shaping, like an Arabic
> or Indic font. Features definitely interact in CJK fonts.
> >
> > Unless I’m misunderstanding what you mean?
> >
> >>
> >> Regards...
> >>
> >> -- Ken
> >>
> >>> On Jan 31, 2019, at 3:22 PM, Myles C. Maxfield <mmaxfield@apple.com>
> wrote:
> >>>
> >>> Hello, everyone!
> >>>
> >>> In order to determine which strategy we should pursue for a streaming
> font interface, we should first determine which situations we are trying to
> improve. Once we have determined the specific scenarios that we are trying
> to attack, we can then create a benchmark to see how bad we are right now
> and to judge the various proposals.
> >>>
> >>> The document from Google sent a few days ago describes "Minimize
> latency for client to view webfont styled content.” I’m hoping we, as a
> group, can go further than this and describe:
> >>>
> >>> 1) Are we concerned with just first page load? Or are we concerned
> with interactions users make with pages? Are we concerned with “infinite
> scrolling” pages?
> >>>
> >>> 2) Which types of webpages have big problems? Is there any way to
> characterize the types of sites that should see an improvement?
> >>>
> >>> 3) Which types of fonts most need improvement in their loading
> experience? Fonts with many independent glyphs, like a Chinese font? Fonts
> with complex shaping rules? Fonts with complicated outlines?
> >>>   => The Google Fonts corpus could provide some big insights here.
> Which fonts are the ones that require big downloads but have much of the
> file unused by the browser? Can such fonts be characterized? In general,
> which fonts are the most popular?
> >>>
> >>> 4) Regarding comparison against the existing unicode-range solution,
> should we try to make a cost function that includes both breaks in shaping
> and latency? Or should we consider that a break in shaping should be
> forbidden? Should we try to incorporate how many text flashes occur during
> each user interaction?
> >>>
> >>> Figuring out the answers to questions like these will help us better
> be able to weigh each possible solution. I’d love to hear everyone’s
> thoughts about these sorts of things.
> >>>
> >>> Thanks,
> >>> Myles
> >>
> >
>
>
>
>
>
>
> ------------------------------
>
> This email has been scanned for spam and viruses. Click here
> <https://us-spambrella.cloud-protect.net/index01.php?mod_id=11&mod_option=logitem&mail_id=1549496815-Sd9g19FpP257&r_address=vladimir.levantovsky%40monotype.com&report=1>
>  to report this email as spam.
>
>
Attachments

image/png attachment: Screen_Shot_2019-02-14_at_12.54.56_AM.png
image/png attachment: 02-Screen_Shot_2019-02-14_at_12.54.56_AM.png
Received on Thursday, 28 February 2019 01:48:41 UTC