Re: PFE challenges to consider from Myles C. Maxfield on 2019-07-23 (public-webfonts-wg@w3.org from July 2019)

From: Myles C. Maxfield <mmaxfield@apple.com>
Date: Tue, 23 Jul 2019 14:23:49 -0700
To: "Levantovsky, Vladimir" <Vladimir.Levantovsky@monotype.com>
Cc: "w3c-webfonts-wg (public-webfonts-wg@w3.org)" <public-webfonts-wg@w3.org>
Message-id: <1790A471-591A-436B-ABC5-9382E0201A7B@apple.com>

> On Jul 23, 2019, at 1:58 PM, Levantovsky, Vladimir <Vladimir.Levantovsky@monotype.com> wrote:
> 
> Folks,
>  
> I’ve been chatting with one of my colleagues (who is the expert in complex scripts) about our progressive font enrichment project, primarily to figure out what fonts we’d need to use as part of our test set for analysis framework. As I explained to him two different approaches we currently consider, and the overall goals of this project, he made a casual remark during the discussion saying “for best results and highest level of efficiency – make sure you are subsetting the font to output glyphs set, and not just based on input data”.

Can you explain this a bit more thoroughly? What does he mean by “input” and “output”? 

>  
> This seemingly innocent remark has immediately raised multiple issues we didn’t consider yet (or, at least didn’t verbalize):
> - output glyphs can be modified by CSS (think e.g. stylistic sets, smallcaps, glyph alternates, etc.) – a font subset created to support a particular page has to account for this;

These are implemented by font features.

> - output glyphs can be modified by a particular rendering mode (e.g. ruby markup in Japanese);

Ruby is implemented either by size/width (which means different selected fonts, or variable fonts) or by font features.

> - output glyphs are subject to shaping / layout rules, we may not always know what they are (even if we know all input character combinations) until the shaping is done, which means the first increment of a particular font has to be loaded to at least support shaping.

For dynamic content, this is certainly true. In general, we haven’t solved the “dynamic content” problem yet at all.

Consider a set of characters which the browser knows are present on a page. Pretend the browser knows all the shaping rules in the font. Consider if the browser could compute the set of every “reachable” glyph that any possible sequence of these characters could reference. I wonder, for normal fonts with normal shaping rules, what the relationship between the size of the set of characters and the size of the set of possibly reachable glyphs. If the correlation is roughly linear or sublinear, this likely isn’t a big deal, but if a small set of input characters can potentially reference every glyph in the font, that would be unfortunate.

>  
> I am sure there is more to consider, this is just the tip of an iceberg. As is, these considerations seem to create certain additional challenges for incremental transfer, and also give bit more weight to an alternative approach Myles has suggested, when a browser can ask for the basic subset to start with and incrementally update it based on real needs determined by shaping and CSS.

In both approaches, the browser knows a) which characters are present on the page b) which characters are affected by which styles, and c) which specific sequences of characters it needs to be rendering in each font. Therefore, in both approaches, the browser can decide whether or not to consider styling or shaping information in its requests to the server. So, I don’t think this addition helps us make a distinction between the two approaches.

It probably means that, whichever solution is picked, the client should be asking the server for particular glyphs, rather than particular characters, because the server doesn’t (shouldn’t) know the styles on the page.

You are right, however, that any implementation should be able to have affordances for grappling with this problem, either by considering all the above, or intentionally not considering some of them.

>  
> Thoughts?
>  
> Thank you,
> Vlad

Received on Tuesday, 23 July 2019 21:24:21 UTC