Re: Incremental transfer subset + binary patch POC from Roderick Sheeter on 2018-08-27 (public-webfonts-wg@w3.org from August 2018)

From: Roderick Sheeter <rsheeter@google.com>
Date: Mon, 27 Aug 2018 12:52:52 -0700
To: "Levantovsky, Vladimir" <Vladimir.Levantovsky@monotype.com>
Cc: WebFonts WG <public-webfonts-wg@w3.org>
Message-ID: <CABscrrF-HT=Wkw=31+73ph1h80jC22K81omXjddYuY7p24X3cw@mail.gmail.com>
The additional visualization is live at
https://fonts.gstatic.com/experimental/incxfer_demo, with a new series "B)
woff2 of each segment, ∑segments XX KB:"

If I may digress, one point the demo does not make at all currently is that
a patch/augmentation approach is uniquely able to optimize cases where
layout is heavily used. Once hb-subset learns how to handle layout I think
we can add some pretty compelling examples for things like Arabic, Indic,
and even many latin-based use cases.

On Mon, Aug 27, 2018 at 9:47 AM Roderick Sheeter <rsheeter@google.com>
wrote:

> Gotcha. I can add the series of perfect subsets you'd use absent knowledge
> of the page walk.
>
> On Mon, Aug 27, 2018 at 9:31 AM Levantovsky, Vladimir <
> Vladimir.Levantovsky@monotype.com> wrote:
>
>> I think we are both in agreement on what “optimal” means in this demo,
>> and this is, in essence, how Monotype’s dynamic subsetting works for every
>> content update.
>>
>> My point was that with three consecutive content updates (as is the case
>> in the example I described in my previous email) we would have to produce
>> three different “optimal” dynamic subsets, and while they cumulatively end
>> up transferring less data compared to the current GF solution, the numbers
>> [that showcase the benefits of incremental updates] would speak for
>> themselves. (Right now, when an “optimal” subset size is shown being less
>> than the size of cumulative patches, it may not be as obvious that
>> incremental updates is still a better solution overall.)
>>
>> To show this, the demo would need to be updated to retain the “optimal”
>> subset size for each content change, and show them as cumulative transfer,
>> similar to what you now show for GF solution where e.g. Latin and Cyrillic
>> character sets are sent as two data blocks.
>>
>>
>>
>> Makes sense?
>>
>> Vlad
>>
>>
>>
>>
>>
>> *From:* Roderick Sheeter [mailto:rsheeter@google.com]
>> *Sent:* Friday, August 24, 2018 5:46 PM
>> *To:* Levantovsky, Vladimir
>> *Cc:* WebFonts WG
>> *Subject:* Re: Incremental transfer subset + binary patch POC
>>
>>
>>
>> WRT optimal, the way it is now is meant to show the absolute best we
>> could do: if we somehow knew a priori what content the user would view
>> (e.g. their page walk and contents thereof) we could cut a single "perfect"
>> subset that covers all that content. The demo is meant to show that a patch
>> series is a better approximation of that optimal than other current options.
>>
>>
>>
>> To give another example, the finer we slice Korean or Japanese (
>> https://developers.googleblog.com/2018/04/google-fonts-launches-korean-support.html)
>> the closer we approximate what incremental transfer would do. This doesn't
>> work for anything that makes heavy use of layout, most notably if we cut
>> Arabic or Indic into a bunch of pieces it's utterly broken. Incremental
>> Transfer would "just work" for these cases.
>>
>>
>>
>> Cheers, Rod S.
>>
>>
>>
>> On Thu, Aug 23, 2018 at 1:31 PM Levantovsky, Vladimir <
>> Vladimir.Levantovsky@monotype.com> wrote:
>>
>> Thank you Rod, this is really cool!
>>
>> I can’t claim I fully understand [yet] the concept behind the Brotli
>> Patch Mode, but the POC demo makes the benefits of incremental transfer
>> quite obvious.
>>
>>
>>
>> One possible additional selling point might also be to calculate and show
>> the cumulative size of all “optimal” woff2 subsets combined, after each
>> incremental content update. E.g., we start with the demo text and add
>> “HELLO WORLD!” to it, followed by “Проверка” (“Testing” in Russian) – GF
>> today would end up sending two subsets (Latin 21.2 KB + Cyrillic 13.8 KB =
>> 35KB), incremental updates would produce three patches yielding (5.9KB +
>> 1.3KB + 0.8KB =) 8KB of font data, while three “optimal” dynamic subsets
>> would result in transferring (5.9KB + 6.7KB + 7.3KB =) 19.9 KB of data.
>>
>>
>>
>> Great work!
>>
>> Thank you,
>>
>> Vlad
>>
>>
>>
>>
>>
>> *From:* Roderick Sheeter [mailto:rsheeter@google.com]
>> *Sent:* Monday, August 20, 2018 4:49 PM
>> *To:* WebFonts WG
>> *Subject:* Incremental transfer subset + binary patch POC
>>
>>
>>
>> Good afternoon,
>>
>>
>>
>> I have some good news for incremental transfer: the Google Compression
>> team that brought us Brotli has toys that may help, specifically Brotli
>> Shared Dictionary, and in the future, Brotli Patch Mode. These tools allow
>> us to compute smaller patches than VCDIFF.
>>
>>
>>
>> Specifically, Brotli allows us to use the current state as a dictionary.
>> This allows the compressed target to refer to the current state. To give a
>> simplified example, instead of storing unchanged bytes just store that you
>> need N bytes from dictionary at offset M. Patch mode will have enhancements
>> to help compress things like identical offset shifts as you might get when
>> compiling code with added/removed functions or similar. This may also be a
>> win for fonts.
>>
>>
>>
>> So, if the client tells us what codepoints it has and what it needs, then
>> we can:
>>
>>
>>
>> 1) Compute current state.
>>
>> Hb-subset is fast enough to plausibly do this "live", or precomputation
>> could be used. We should permit the server to respond by patching to any
>> set of codepoints it likes, not exactly what was requested.
>>
>> 2) Compute desired state.
>>
>> 3) Compute patch to get from current=>desired using a public standardized
>> patch algorithm.
>>
>> 4) Send patch to client.
>>
>> The client can then obtain the augmented font by applying the patch.
>>
>>
>>
>> I built a proof of concept to let us begin to play around with this,
>> available at https://fonts.gstatic.com/experimental/incxfer_demo
>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__fonts.gstatic.com_experimental_incxfer-5Fdemo&d=DwMFaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=jb2T9D8Np5j0t1X2JtGDVMxJyD5fvLoEPxzRs46vOK4UfGfOrlVsyuleed6YRZk5&m=aNlVXJtTKkRlneuxLUQuXDTw_yjqSReFvlw5gVn_YDM&s=OUjEsoFArQRZp0jNOMm_uLcGHvS3Kgc-mbjQ4OfRC1s&e=>.
>> The demo allows you to add arbitrary text to an initial block and then
>> transfers a font patch using either VCDIFF or Brotli Shared Dictionary. To
>> give context, it also displays the size of a WOFF2 of the exact subset
>> needed (optimal known delivery strategy) and what Google Fonts would
>> transfer today. All subsetting is done with hb-subset, which doesn't
>> support layout yet (coming soon!).
>>
>>
>>
>> It is my hope that this demonstrates that:
>>
>>
>>
>> 1) We can specify incremental transfer in a way that minimizes client
>> implementation difficulty.
>>
>>     a. An HTTP interaction, no new protocol.
>>
>>     b. A generic patch algorithm works fine.
>>
>> 2) The client can avoid needing changes when the font spec changes, it
>> just needs an implementation of the patch algorithm.
>>
>> If the client keeps Brotli up to date then it'll have one.
>>
>> 3) We don't need client to add any new libraries, just new versions of
>> existing ones.
>>
>> The client does still need code changes to wire everything together. If
>> patching things doesn't fit the clients model this may be a good chunk of
>> work.
>>
>>
>>
>> One of the main things we could lose out on in the demo is WOFF2 glyf
>> transformation. However, one could subset then apply the woff2 glyf
>> transformation (for both current and desired). The client would receive the
>> patch, apply it, and then undo the transform.
>>
>>
>>
>> Cheers, Rod S.
>>
>>
Received on Monday, 27 August 2018 19:53:27 UTC