Re: Codepoint Set Compression Round 2 from Roderick Sheeter on 2019-08-14 (public-webfonts-wg@w3.org from August 2019)

From: Roderick Sheeter <rsheeter@google.com>
Date: Wed, 14 Aug 2019 10:07:18 -0700
To: Garret Rieger <grieger@google.com>
Cc: Ken Lunde <lunde@adobe.com>, Jonathan Kew <jfkthame@gmail.com>, "public-webfonts-wg@w3.org" <public-webfonts-wg@w3.org>
Message-ID: <CABscrrEw6dmtqk46dy8ggydO=FxAbv77nOJB4+kPQBWGc90_qg@mail.gmail.com>

Regarding knowing early what codepoints are in the font that's not
necessarily today:

/* Must download to know what's here */
@font-face
  font-family: 'Duck'
  src: url(duck.woff2);

CSS already lets us tell the browser what characters are present via
unicode-range so, speculating wildly, what if streaming was activated via
src and unicode-range could still be used:

@font-face
  font-family: 'Duck'
  /* Progressively enrich if you support it, otherwise load the WOFF2 */
  src: stream(<url to stream from),
         url(duck.woff2);
  /* Only try for these codepoints from Duck */
  unicode-range: <the codepoints available overall>

On Tue, Aug 6, 2019 at 9:47 AM Garret Rieger <grieger@google.com> wrote:

> For both fontTools and harfbuzz we currently collect all glyphs that can
> be reached via any sequence of feature applications via the G* tables. In
> harfbuzz we don't yet do glyph closure for cmap 14, but I've added that to
> our task list. fontTools currently handles computing the closure for cmap
> 14:
> https://github.com/fonttools/fonttools/blob/master/Lib/fontTools/subset/__init__.py#L2139
> .
>
> On Tue, Aug 6, 2019 at 5:29 AM Ken Lunde <lunde@adobe.com> wrote:
>
>> Jonathan and others,
>>
>> Don't forget about sequences that resolve to glyphs, which can be
>> cultivated from 'ccmp' and other ligature-like features, along with the
>> Format 14 'cmap' subtable. Perhaps the most complex sequences are those
>> expressed via the 'ljmo', 'vjmo', and 'tjmo' features that are used for
>> combining jamo for which there are 11,875 two-character sequences (LV) and
>> 1,626,875 three-character ones (LVT). The Source Han / Noto CJK fonts can
>> serve as example implementations of everything I described above.
>>
>> -- Ken
>>
>> > On Aug 6, 2019, at 2:44 AM, Jonathan Kew <jfkthame@gmail.com> wrote:
>> >
>> > On 05/08/2019 23:35, Garret Rieger wrote:
>> >
>> >> This will of course add some extra size to the first response, but
>> having the client know in advance the specific code points in the source
>> font has value beyond compressing the sets. For example if the browser
>> knows which codepoints are in the font it doesn't need to waste
>> requests/bytes sending augmentation requests for codepoints that aren't
>> actually in the font.
>> >
>> > It's pretty important for the browser to know early on which codepoints
>> are in the font. It needs this in order to be able to appropriately fall
>> back to the next font in the font-family list (perhaps kicking off a new
>> font load for a different resource) for characters that aren't going to be
>> supported by this one even after it is fully loaded.
>> >
>> > Equally, if a given character *is* going to be supported by the current
>> font (even though it hasn't been fetched yet), we don't want the browser to
>> start loading further fonts in the stack.
>> >
>> > JK
>> >
>>
>>
>>

Received on Wednesday, 14 August 2019 17:07:54 UTC