Re: Codepoint Set Compression Round 2

Jonathan and others,

Don't forget about sequences that resolve to glyphs, which can be cultivated from 'ccmp' and other ligature-like features, along with the Format 14 'cmap' subtable. Perhaps the most complex sequences are those expressed via the 'ljmo', 'vjmo', and 'tjmo' features that are used for combining jamo for which there are 11,875 two-character sequences (LV) and 1,626,875 three-character ones (LVT). The Source Han / Noto CJK fonts can serve as example implementations of everything I described above.

-- Ken

> On Aug 6, 2019, at 2:44 AM, Jonathan Kew <jfkthame@gmail.com> wrote:
> 
> On 05/08/2019 23:35, Garret Rieger wrote:
> 
>> This will of course add some extra size to the first response, but having the client know in advance the specific code points in the source font has value beyond compressing the sets. For example if the browser knows which codepoints are in the font it doesn't need to waste requests/bytes sending augmentation requests for codepoints that aren't actually in the font.
> 
> It's pretty important for the browser to know early on which codepoints are in the font. It needs this in order to be able to appropriately fall back to the next font in the font-family list (perhaps kicking off a new font load for a different resource) for characters that aren't going to be supported by this one even after it is fully loaded.
> 
> Equally, if a given character *is* going to be supported by the current font (even though it hasn't been fetched yet), we don't want the browser to start loading further fonts in the stack.
> 
> JK
> 

Received on Tuesday, 6 August 2019 12:29:05 UTC