- From: Roderick Sheeter <rsheeter@google.com>
- Date: Wed, 24 Jul 2019 11:14:05 -0700
- To: "Levantovsky, Vladimir" <Vladimir.Levantovsky@monotype.com>
- Cc: Garret Rieger <grieger@google.com>, "mmaxfield@apple.com" <mmaxfield@apple.com>, "w3c-webfonts-wg (public-webfonts-wg@w3.org)" <public-webfonts-wg@w3.org>
- Message-ID: <CABscrrEAvFb2fbP8a=HbFesXfitc6Q8n=MhWQVpbZH4UCBezBg@mail.gmail.com>
We are working on a dataset of page sequences ready to simulate the different approaches to PFE. What's popping out here is that we also need to talk about which *fonts* to use. So, who has fonts that seems likely to surface issues that they want to let WG use for PFE testing purposes? On Wed, Jul 24, 2019 at 11:08 AM Levantovsky, Vladimir < Vladimir.Levantovsky@monotype.com> wrote: > From pragmatic point of view (and I mean it in best possible sense of it > because I like pragmatic approach in general) it may well be a good > solution, but we also need to be conscious about potential significant > redundancy that it entails. For example, some fonts have multiple stylistic > sets supported (if I remember correctly, font “Gabriella” has eight of > them) and, realistically, only one would be used on page so the built-in > redundancy of indiscriminate “glyph closure” could be very significant. > > > > > > *From:* Garret Rieger <grieger@google.com> > *Sent:* Wednesday, July 24, 2019 1:43 PM > *To:* Levantovsky, Vladimir <Vladimir.Levantovsky@monotype.com> > *Cc:* mmaxfield@apple.com; w3c-webfonts-wg (public-webfonts-wg@w3.org) < > public-webfonts-wg@w3.org> > *Subject:* Re: PFE challenges to consider > > > > Currently font subsetters handle this problem by computing a "glyph > closure". This finds all possible glyphs that are reachable from a set of > starting glyphs (derived from the input code points) by the application of > any sequence of layout features in the font. All glyphs in the closure are > retained in the produced subset. Since the subset and patch transfer method > uses a font subsetter underneath the data sent back to the client will > include all glyphs that may be needed to render a particular set of > codepoints regardless of what layout features are activated client side. > > > > On Wed, Jul 24, 2019 at 10:40 AM Levantovsky, Vladimir < > Vladimir.Levantovsky@monotype.com> wrote: > > Hi Myles, all, > > > > Please see inline. > > > > *From:* mmaxfield@apple.com <mmaxfield@apple.com> > *Sent:* Tuesday, July 23, 2019 5:24 PM > *To:* Levantovsky, Vladimir <Vladimir.Levantovsky@monotype.com> > *Cc:* w3c-webfonts-wg (public-webfonts-wg@w3.org) < > public-webfonts-wg@w3.org> > *Subject:* Re: PFE challenges to consider > > > > > > > > On Jul 23, 2019, at 1:58 PM, Levantovsky, Vladimir < > Vladimir.Levantovsky@monotype.com> wrote: > > > > Folks, > > > > I’ve been chatting with one of my colleagues (who is the expert in complex > scripts) about our progressive font enrichment project, primarily to figure > out what fonts we’d need to use as part of our test set for analysis > framework. As I explained to him two different approaches we currently > consider, and the overall goals of this project, he made a casual remark > during the discussion saying “for best results and highest level of > efficiency – make sure you are subsetting the font to output glyphs set, > and not just based on input data”. > > > > Can you explain this a bit more thoroughly? What does he mean by “input” > and “output”? > > > <VL> The input is a textual content that is a part of the page content – > character strings, character combinations/sequences, CSS font features > applied, etc. The output is a set of glyph IDs that is going to be rendered > to display the textual content, which is determined after the shaping and > layout takes place, and all font features applied. </VL> > > > > This seemingly innocent remark has immediately raised multiple issues we > didn’t consider yet (or, at least didn’t verbalize): > > - output glyphs can be modified by CSS (think e.g. stylistic sets, > smallcaps, glyph alternates, etc.) – a font subset created to support a > particular page has to account for this; > > > > These are implemented by font features. > > > > <VL> Yes, but the discretionary features to be applied are often specified > by CSS. So, for simple case example, if an initial Latin font subset is > created to include all required lowercase and uppercase glyphs that > correspond to the list of codepoints provided by a browser, and the CSS > calls for small caps feature to be applied – we end up with an initial > subset that includes lowercase glyphs we do not need, and is missing small > caps glyphs we do need. </VL> > > > > - output glyphs can be modified by a particular rendering mode (e.g. ruby > markup in Japanese); > > > > Ruby is implemented either by size/width (which means different selected > fonts, or variable fonts) or by font features. > > > > <VL> From what I’ve been told (Ken can correct me if I am wrong) – some > ruby markup may call for different base glyphs. </VL> > > > > - output glyphs are subject to shaping / layout rules, we may not always > know what they are (even if we know all input character combinations) until > the shaping is done, which means the first increment of a particular font > has to be loaded to at least support shaping. > > > > For dynamic content, this is certainly true. In general, we haven’t solved > the “dynamic content” problem yet at all. > > > > <VL> I don’t think this is true only for dynamic content. </VL> > > > > Consider a set of characters which the browser knows are present on a > page. Pretend the browser knows all the shaping rules in the font. > > > > <VL> How can a browser possibly know all the shaping rules in the font > without having that particular font. Shaping rules in major parts are > defined by the content of GSUB/GPOS/GDEF tables, and even if you have two > different fonts covering the same exact script – glyph IDs are likely to be > different for the same glyphs, the content of the layout tables will be > different, and the output set of glyph IDs that need to be encoded in a > font subset to display the same text content will be different from one > font to another. I don’t see how we can make an assumption that the browser > knows all shaping rules in advance. </VL> > > > > > > Consider if the browser could compute the set of every “reachable” glyph > that any possible sequence of these characters could reference. > > > > <VL> It cannot, in my opinion. Computing reachable glyphs is the process > where shaping / layout rules and other font features are applied – you have > to have at least a font subset already delivered that gives you that data. > </VL> > > > > I wonder, for normal fonts with normal shaping rules, what the > relationship between the size of the set of characters and the size of the > set of possibly reachable glyphs. If the correlation is roughly linear or > sublinear, this likely isn’t a big deal, but if a small set of input > characters can potentially reference every glyph in the font, that would be > unfortunate. > > > <VL> I don’t think the concept of a “normal font” is even applicable in > this case, for the reasons I previously mentioned. </VL> > > > > I am sure there is more to consider, this is just the tip of an iceberg. > As is, these considerations seem to create certain additional challenges > for incremental transfer, and also give bit more weight to an alternative > approach Myles has suggested, when a browser can ask for the basic subset > to start with and incrementally update it based on real needs determined by > shaping and CSS. > > > > In both approaches, the browser knows a) which characters are present on > the page b) which characters are affected by which styles, and c) which > specific sequences of characters it needs to be rendering in each font. > Therefore, in both approaches, the browser can decide whether or not to > consider styling or shaping information in its requests to the server. So, > I don’t think this addition helps us make a distinction between the two > approaches. > > > > <VL> Cases a) and b) are true, c) is a controversial subject – the browser > will know what character combinations are present in the input but it > doesn’t know whether those combinations will be rendered by a sequence of > individual glyphs (and which particular glyphs as is the case for e.g. > Arabic), or if there is a single glyph that needs to be rendered to display > a ligature, or a syllable – this knowledge can only be obtained after > shaping is done, and from that point on the browser will not be dealing > with character sequences, it will be dealing with glyph IDs. > > > > In one approach, the browser would have to send back to font server > everything it knows about a particular input, including information about > discretionary features, text spans for which those discretionary features > are selected, etc., and basically ask the font server to either do the > shaping and determine the optimal font subset to be created, or to create a > “superset” that would cover all possible output combinations. And even if > this can be done, I am not sure if it’s reasonable – too many things can go > wrong. > > > > In another approach, the browser needs an initial subset that provides all > necessary metric/layout/shaping data, does the shaping, determines an > output set of glyph IDs that are needed to display the input, and asks for > an incremental subset that contains outline data for those glyph IDs. What > it gets back is an optimal subset containing everything that is needed and > nothing is missing. </VL> > > > > It probably means that, whichever solution is picked, the client should be > asking the server for particular glyphs, rather than particular characters, > because the server doesn’t (shouldn’t) know the styles on the page. > > > > <VL> Exactly my point, and this is why I mentioned that considering the > magnitude of possible input variations (languages, font features, …) the > approach you proposed [where font data is organized in a particular way and > the browser can start by reading what is required for shaping and then > amend it with the glyph data] should be given additional consideration > because of it.</VL> > > > > You are right, however, that any implementation should be able to have > affordances for grappling with this problem, either by considering all the > above, or intentionally not considering some of them. > > > > > > Thoughts? > > > > Thank you, > > Vlad > > > >
Received on Wednesday, 24 July 2019 18:14:41 UTC