Re: Evaluation report from Roderick Sheeter on 2020-09-29 (public-webfonts-wg@w3.org from September 2020)

From: Roderick Sheeter <rsheeter@google.com>
Date: Tue, 29 Sep 2020 15:59:00 -0700
To: Garret Rieger <grieger@google.com>
Cc: Chris Lilley <chris@w3.org>, "w3c-webfonts-wg (public-webfonts-wg@w3.org)" <public-webfonts-wg@w3.org>
Message-ID: <CABscrrEopXA2-_gO1k0-GiuyLDjxj0JhNNKAQM6ua9togYJz6g@mail.gmail.com>
Maybe this is too speculative for an eval report but if we manage to get 32
bit glyph ids into a future rev of the font format then PFE can deliver a
true pan-unicode font. Noto in one font!

On Thu, Sep 17, 2020 at 6:30 PM Garret Rieger <grieger@google.com> wrote:

> Thanks this is looking good so far. Just a couple of thoughts I had:
>
> It might be worth mentioning pan unicode fonts (such as Noto) as a use
> case which is not currently well supported by existing font transfer
> methods. To date for web usage we have to deliver Noto as a bunch of
> separate families and leave it up to the developer to explicitly pick which
> ones they might need. For many types of applications is difficult to know
> ahead of time what languages may show up in the content so this can be
> difficult. For example:
>
>    - A forum where users may be posting in many different languages.
>    - A mapping application which will need to render text in a wide array
>    of scripts depending on what part of the world you're viewing.
>
> PFE can enable the easy and efficient use of a pan unicode font. Something
> that's not possible today.
>
> Section 2.7, this isn't filled out yet, but here's some examples that we
> run into on Google Fonts that could be used to demonstrate issues in trying
> to subset fonts to improve performance:
>
>    - With indic scripts there are some shared characters between the
>    scripts. If you have a font which supports two or more of these scripts and
>    want to present it as a single family and use unicode range to deliver each
>    script in it's own subset you run into trouble. The shared characters need
>    to be duplicated in each subset. However, the way unicode range works is
>    that shared character will be rendered from only one of the subsets based
>    on the priority of the ranges. This can result in the shared character
>    being rendered from a different subset than the surrounding characters. As
>    a result shaping doesn't work correctly and you end up with poor rendering.
>    We've had to work around this problem by releasing indic scripts as
>    separate families which is non-optimal for end users (for example see:
>    https://fonts.google.com/?query=Baloo)
>    - Another example of this problem is with latin punctuation in Latin
>    and Cyrillic fonts. Say we have a single family with Latin and Cyrllic
>    characters. We want to have cyrillic and latin in their own subsets so we
>    don't waste bytes downloading cyrllic on latin only pages and vice versa.
>    However if you have cyrillic text that uses the period "." from the latin
>    subset then kerning rules between the cyrllic characters and the . no
>    longer work. Not quite as disastrous as the indic example but still results
>    in imperfect rendering.
>
> Section 3.4
>
> Not sure what level of detail you want to go into on the specifics of the
> byterange approach but there's a couple of points that might be worth
> mentioning:
>
>    - For byte range to work fonts must be preprocessed to flatten
>    composite glyphs into the resulting outlines and the CFF table must be
>    desubroutinized. Also the glyf/CFF table need to be moved to the end of the
>    font if it's not already there.
>    - Another source of efficiency loss is from being unable to leverage
>    compression across requests. In a single woff2 font file redundant data
>    between glyphs compresses out. If under byterange those glyphs are
>    transferred in separate requests the redundant data is retransmitted.
>
> Section 3.9
>
>    - We have a defined wire protocol for Subset and Patch (
>    https://docs.google.com/document/d/1DJ6VkUEZS2kvYZemIoX4fjCgqFXAtlRjpMlkOvblSts/edit)
>    and that protocol is used during the simulations so that we're correctly
>    accounting for protocol overhead which can be substantial in some cases.
>    For example CJK augmentation requests that need to specify a large number
>    of codepoints. However, as noted in the doc this protocol is only meant to
>    be a stand in for size estimation. We will want to rewrite the protocol for
>    standardization.
>
>
> On Mon, Sep 14, 2020 at 8:48 AM Chris Lilley <chris@w3.org> wrote:
>
>> Hi folks,
>>
>> I just committed a first draft of the Evaluation Report.
>>
>> I mainly concentrated on easy to understand introductory material,
>> explaining the problem to be solved and why it is important. Bearing in
>> mind that the primary audience are not necessarily familiar with fonts
>> in general.
>>
>> Also, as a group we do not have a final analysis or any conclusions
>> decided, so those sections are simply blank.
>>
>> But it gives an outline of how the report could look, so we can discuss
>> the overall structure and approach at least.
>>
>> https://w3c.github.io/PFE-analysis/report/evaluation-report.html
>>
>> Reading over it just before the call, I think it actually needs a whole
>> new section that explains what OpenType is, sfnt table structure, and
>> how rendering a single glyph can depend on data scattered over several
>> tables. But I wanted to discuss that on the call before starting to add
>> it. I'm thinking again of an introductory and probably diagram-heavy
>> exposition. And this will explain in turn why the byterange approach has
>> to concentrate on a single table rather than lots of little tables.
>>
>> --
>> Chris Lilley
>> @svgeesus
>> Technical Director @ W3C
>> W3C Strategy Team, Core Web Design
>> W3C Architecture & Technology Team, Core Web & Media
>>
>>
>>
Received on Tuesday, 29 September 2020 22:59:25 UTC