Re: Glyph Closure Scaling from Myles C. Maxfield on 2019-08-05 (public-webfonts-wg@w3.org from August 2019)

From: Myles C. Maxfield <mmaxfield@apple.com>
Date: Mon, 05 Aug 2019 12:02:23 -0700
To: Ken Lunde <lunde@adobe.com>
Cc: "w3c-webfonts-wg (public-webfonts-wg@w3.org)" <public-webfonts-wg@w3.org>
Message-id: <3284B4B1-9D84-44BE-B50B-E5382AFF7658@apple.com>
The client needs to know, before it has the first byte of the file, whether it should be requesting code points or shaping information.

> On Aug 5, 2019, at 11:53 AM, Ken Lunde <lunde@adobe.com> wrote:
> 
> Myles,
> 
> Why not employ both models? If the fonts that benefit from the second model can be differentiated by the mere presence of specific GSUB features, which is likely the case, you can almost have the best of both worlds.
> 
> Regards...
> 
> -- Ken
> 
>> On Aug 5, 2019, at 11:11 AM, Myles C. Maxfield <mmaxfield@apple.com> wrote:
>> 
>> Here’s the graph of integral against file size:
>> 
>> <Screen Shot 2019-08-05 at 10.42.46 AM.png>
>> 
>> The X axis is file size. The Y axis is integral.
>> 
>> (Aside: It seems like any time you graph anything interesting against file size, the results are bimodal…)
>> 
>> It looks like all of the fonts with high deviations are pretty small font files, compared to the sizes of all fonts. We should determine whether we should even be investigating these fonts. If the fonts are small, the current state of the art works better, and we may not need to do anything. We still haven’t solve the “flashy text” problem, so if the benefits are negligible, requesting the whole font is beneficial because there will be fewer flashes.
>> 
>> --------------------------------------
>> 
>> So, we’re trying to decide whether our solution should operate on code points or glyph IDs. The two models are:
>> 
>> A) The client sends a set of code points to the server, and the server replies with the glyph closure of those code points. This has the drawback that the server may have to send tons of unnecessary glyphs to the client
>> B) The client sends an early request to the server to get shaping information, and then sends additional requests to the server for glyphs. This has the drawback that every font requires an extra round trip to the server before the client can even begin to start requesting glyphs. The benefit is that no unnecessary glyphs would be sent to the client.
>> 
>> The range-request option already has to send an early request to the server (to get the table of contents), so Option B sounds clearly better there. However, the smart-server option can be augmented to work with either of these models. As John mentioned earlier in this thread regarding Tibetan, different fonts used to render the same code points can have dramatically different characteristics. Therefore, the client can’t dynamically pick option A or option B at runtime before it has any data at all; instead, this standardization group would probably have to pick one of the two options and bless it. (Mechanically, this decision could be exposed to CSS authors by adding an additional descriptor in @font-face, but this detail is probably too low-level and too difficult to explain, and is likely to result in cargo cults.)
>> 
>> So the tradeoff is: should we penalize Arabic/handwritten fonts fairly dramatically, or should we penalize all fonts a little bit regardless of language? From my analysis a few months ago, I observed that a single range request “cost” 8 glyphs. From the graph I sent out last night, those big triangles where the coverage size shoots up are way taller than 8 glyphs.
>> 
>> —Myles
>> 
>>>> On Aug 5, 2019, at 9:55 AM, John Hudson <john@tiro.ca> wrote:
>>> 
>>> On 05082019 9:40 am, Garret Rieger wrote:
>>>> Thanks Myles, this is very informative. For the arabic fonts it makes intuitive sense to me that they'd deviate due to them having lots of inter character layout rules. However, seeing a few LGC fonts end up deviating surprises me. I think it would be interesting to dig into the internals of a couple of those LGC fonts to try and understand why the closure on those pulls in so many extra glyphs. 
>>> 
>>> Segoe Script is a handwriting style font, so I am not surprised by the deviation: the font contains a lot of glyph variants that are deployed contextually to pseudo-randomise letter sequences and mimic the variety of form in handwriting.
>>> 
>>> I'm not sure why Bahnscrift would deviate so much. It is a variable font, but I thought the GSUB was pretty simple.
>>> 
>>> BTW, the reason the Microsoft Himalaya Tibetan font deviates so much is that it uses lots of contextual alternates to control the depth of conjunct stacks. I would expect other Tibetan fonts not to deviate this much.
>>> 
>>> The deviation data confirms what I assumed would be the case: Arabic fonts will tend to require significantly more glyph downloads because of the joining forms requirements, and fonts that provide contextual variation of letter shapes to mimic handwriting or implement traditional Arabic styles will also deviate significantly. Fonts like Aldhabi (BTW, diwani style, not nastaliq) and Urdu Typesetting combine both these aspects: joining forms and contextual variation, as well as decomposition of Arabic letters into rasm and separate dot glyphs. [I'm interested to see that Arabic Typesetting has a similar deviation to Aldhabi and Urdu Typesetting: it uses a large ligature set instead of relying only on contextual variants.]
>>> 
>>> JH
>>> 
>>> 
>>> -- 
>>> 
>>> John Hudson
>>> Tiro Typeworks Ltd    www.tiro.com
>>> Salish Sea, BC        tiro@tiro.com
>>> 
>>> NOTE: In the interests of productivity, I am currently
>>> dealing with email on only two days per week, usually
>>> Monday and Thursday unless this schedule is disrupted
>>> by travel. If you need to contact me urgently, please
>>> use some other method of communication. Thank you.
>>> 
>>> 
>> 
>
Received on Monday, 5 August 2019 19:02:50 UTC