Re: Evaluation report from Chris Lilley on 2020-09-30 (public-webfonts-wg@w3.org from September 2020)

From: Chris Lilley <chris@w3.org>
Date: Wed, 30 Sep 2020 20:51:59 +0300
To: public-webfonts-wg@w3.org
Message-ID: <3b0b1ea3-46f3-2180-7d9d-b0b82cf3c1d6@w3.org>
On 2020-09-30 01:59, Roderick Sheeter wrote:
> Maybe this is too speculative for an eval report but if we manage to 
> get 32 bit glyph ids into a future rev of the font format then PFE can 
> deliver a true pan-unicode font. Noto in one font!

Agree it doesn't belong there, due to not being a thing yet, but would 
be very neat when it does happen.



>
> On Thu, Sep 17, 2020 at 6:30 PM Garret Rieger <grieger@google.com 
> <mailto:grieger@google.com>> wrote:
>
>     Thanks this is looking good so far. Just a couple of thoughts I had:
>
>     It might be worth mentioning pan unicode fonts (such as Noto) as a
>     use case which is not currently well supported by existing font
>     transfer methods. To date for web usage we have to deliver Noto as
>     a bunch of separate families and leave it up to the developer to
>     explicitly pick which ones they might need. For many types of
>     applications is difficult to know ahead of time what languages may
>     show up in the content so this can be difficult. For example:
>
>       * A forum where users may be posting in many different languages.
>       * A mapping application which will need to render text in a wide
>         array of scripts depending on what part of the world you're
>         viewing.
>
>     PFE can enable the easy and efficient use of a pan unicode font.
>     Something that's not possible today.
>
>     Section 2.7, this isn't filled out yet, but here's some examples
>     that we run into on Google Fonts that could be used to demonstrate
>     issues in trying to subset fonts to improve performance:
>
>       * With indic scripts there are some shared characters between
>         the scripts. If you have a font which supports two or more of
>         these scripts and want to present it as a single family
>         and use unicode range to deliver each script in it's own
>         subset you run into trouble. The shared characters need to be
>         duplicated in each subset. However, the way unicode range
>         works is that shared character will be rendered from only one
>         of the subsets based on the priority of the ranges. This can
>         result in the shared character being rendered from a different
>         subset than the surrounding characters. As a result shaping
>         doesn't work correctly and you end up with poor rendering.
>         We've had to work around this problem by releasing indic
>         scripts as separate families which is non-optimal for end
>         users (for example see: https://fonts.google.com/?query=Baloo
>         <https://fonts.google.com/?query=Baloo>)
>       * Another example of this problem is with latin punctuation in
>         Latin and Cyrillic fonts. Say we have a single family with
>         Latin and Cyrllic characters. We want to have cyrillic and
>         latin in their own subsets so we don't waste bytes downloading
>         cyrllic on latin only pages and vice versa. However if you
>         have cyrillic text that uses the period "." from the latin
>         subset then kerning rules between the cyrllic characters and
>         the . no longer work. Not quite as disastrous as the indic
>         example but still results in imperfect rendering.
>
>     Section 3.4
>
>     Not sure what level of detail you want to go into on the specifics
>     of the byterange approach but there's a couple of points that
>     might be worth mentioning:
>
>       * For byte range to work fonts must be preprocessed to flatten
>         composite glyphs into the resulting outlines and the CFF table
>         must be desubroutinized. Also the glyf/CFF table need to be
>         moved to the end of the font if it's not already there.
>       * Another source of efficiency loss is from being unable to
>         leverage compression across requests. In a single woff2 font
>         file redundant data between glyphs compresses out. If under
>         byterange those glyphs are transferred in separate requests
>         the redundant data is retransmitted.
>
>     Section 3.9
>
>       * We have a defined wire protocol for Subset and Patch
>         (https://docs.google.com/document/d/1DJ6VkUEZS2kvYZemIoX4fjCgqFXAtlRjpMlkOvblSts/edit
>         <https://docs.google.com/document/d/1DJ6VkUEZS2kvYZemIoX4fjCgqFXAtlRjpMlkOvblSts/edit>)
>         and that protocol is used during the simulations so that we're
>         correctly accounting for protocol overhead which can be
>         substantial in some cases. For example CJK augmentation
>         requests that need to specify a large number of codepoints.
>         However, as noted in the doc this protocol is only meant to be
>         a stand in for size estimation. We will want to rewrite the
>         protocol for standardization.
>
>
>     On Mon, Sep 14, 2020 at 8:48 AM Chris Lilley <chris@w3.org
>     <mailto:chris@w3.org>> wrote:
>
>         Hi folks,
>
>         I just committed a first draft of the Evaluation Report.
>
>         I mainly concentrated on easy to understand introductory
>         material,
>         explaining the problem to be solved and why it is important.
>         Bearing in
>         mind that the primary audience are not necessarily familiar
>         with fonts
>         in general.
>
>         Also, as a group we do not have a final analysis or any
>         conclusions
>         decided, so those sections are simply blank.
>
>         But it gives an outline of how the report could look, so we
>         can discuss
>         the overall structure and approach at least.
>
>         https://w3c.github.io/PFE-analysis/report/evaluation-report.html
>         <https://w3c.github.io/PFE-analysis/report/evaluation-report.html>
>
>         Reading over it just before the call, I think it actually
>         needs a whole
>         new section that explains what OpenType is, sfnt table
>         structure, and
>         how rendering a single glyph can depend on data scattered over
>         several
>         tables. But I wanted to discuss that on the call before
>         starting to add
>         it. I'm thinking again of an introductory and probably
>         diagram-heavy
>         exposition. And this will explain in turn why the byterange
>         approach has
>         to concentrate on a single table rather than lots of little
>         tables.
>
>         -- 
>         Chris Lilley
>         @svgeesus
>         Technical Director @ W3C
>         W3C Strategy Team, Core Web Design
>         W3C Architecture & Technology Team, Core Web & Media
>
>
-- 
Chris Lilley
@svgeesus
Technical Director @ W3C
W3C Strategy Team, Core Web Design
W3C Architecture & Technology Team, Core Web & Media
Received on Wednesday, 30 September 2020 17:52:04 UTC