Re: Evaluation report from Garret Rieger on 2020-09-18 (public-webfonts-wg@w3.org from September 2020)

From: Garret Rieger <grieger@google.com>
Date: Thu, 17 Sep 2020 18:29:53 -0700
To: Chris Lilley <chris@w3.org>
Cc: "w3c-webfonts-wg (public-webfonts-wg@w3.org)" <public-webfonts-wg@w3.org>
Message-ID: <CAM=OCWa-mNwsw_eEo-GzuPXw3ta5yevHqvxo60mOMbOARmi6RQ@mail.gmail.com>
Thanks this is looking good so far. Just a couple of thoughts I had:

It might be worth mentioning pan unicode fonts (such as Noto) as a use case
which is not currently well supported by existing font transfer methods. To
date for web usage we have to deliver Noto as a bunch of separate families
and leave it up to the developer to explicitly pick which ones they might
need. For many types of applications is difficult to know ahead of time
what languages may show up in the content so this can be difficult. For
example:

   - A forum where users may be posting in many different languages.
   - A mapping application which will need to render text in a wide array
   of scripts depending on what part of the world you're viewing.

PFE can enable the easy and efficient use of a pan unicode font. Something
that's not possible today.

Section 2.7, this isn't filled out yet, but here's some examples that we
run into on Google Fonts that could be used to demonstrate issues in trying
to subset fonts to improve performance:

   - With indic scripts there are some shared characters between the
   scripts. If you have a font which supports two or more of these scripts and
   want to present it as a single family and use unicode range to deliver each
   script in it's own subset you run into trouble. The shared characters need
   to be duplicated in each subset. However, the way unicode range works is
   that shared character will be rendered from only one of the subsets based
   on the priority of the ranges. This can result in the shared character
   being rendered from a different subset than the surrounding characters. As
   a result shaping doesn't work correctly and you end up with poor rendering.
   We've had to work around this problem by releasing indic scripts as
   separate families which is non-optimal for end users (for example see:
   https://fonts.google.com/?query=Baloo)
   - Another example of this problem is with latin punctuation in Latin and
   Cyrillic fonts. Say we have a single family with Latin and Cyrllic
   characters. We want to have cyrillic and latin in their own subsets so we
   don't waste bytes downloading cyrllic on latin only pages and vice versa.
   However if you have cyrillic text that uses the period "." from the latin
   subset then kerning rules between the cyrllic characters and the . no
   longer work. Not quite as disastrous as the indic example but still results
   in imperfect rendering.

Section 3.4

Not sure what level of detail you want to go into on the specifics of the
byterange approach but there's a couple of points that might be worth
mentioning:

   - For byte range to work fonts must be preprocessed to flatten composite
   glyphs into the resulting outlines and the CFF table must be
   desubroutinized. Also the glyf/CFF table need to be moved to the end of the
   font if it's not already there.
   - Another source of efficiency loss is from being unable to leverage
   compression across requests. In a single woff2 font file redundant data
   between glyphs compresses out. If under byterange those glyphs are
   transferred in separate requests the redundant data is retransmitted.

Section 3.9

   - We have a defined wire protocol for Subset and Patch (
   https://docs.google.com/document/d/1DJ6VkUEZS2kvYZemIoX4fjCgqFXAtlRjpMlkOvblSts/edit)
   and that protocol is used during the simulations so that we're correctly
   accounting for protocol overhead which can be substantial in some cases.
   For example CJK augmentation requests that need to specify a large number
   of codepoints. However, as noted in the doc this protocol is only meant to
   be a stand in for size estimation. We will want to rewrite the protocol for
   standardization.


On Mon, Sep 14, 2020 at 8:48 AM Chris Lilley <chris@w3.org> wrote:

> Hi folks,
>
> I just committed a first draft of the Evaluation Report.
>
> I mainly concentrated on easy to understand introductory material,
> explaining the problem to be solved and why it is important. Bearing in
> mind that the primary audience are not necessarily familiar with fonts
> in general.
>
> Also, as a group we do not have a final analysis or any conclusions
> decided, so those sections are simply blank.
>
> But it gives an outline of how the report could look, so we can discuss
> the overall structure and approach at least.
>
> https://w3c.github.io/PFE-analysis/report/evaluation-report.html
>
> Reading over it just before the call, I think it actually needs a whole
> new section that explains what OpenType is, sfnt table structure, and
> how rendering a single glyph can depend on data scattered over several
> tables. But I wanted to discuss that on the call before starting to add
> it. I'm thinking again of an introductory and probably diagram-heavy
> exposition. And this will explain in turn why the byterange approach has
> to concentrate on a single table rather than lots of little tables.
>
> --
> Chris Lilley
> @svgeesus
> Technical Director @ W3C
> W3C Strategy Team, Core Web Design
> W3C Architecture & Technology Team, Core Web & Media
>
>
>
Received on Friday, 18 September 2020 01:30:24 UTC