RE: Incremental transfer subset + binary patch POC from Levantovsky, Vladimir on 2018-08-27 (public-webfonts-wg@w3.org from August 2018)

From: Levantovsky, Vladimir <Vladimir.Levantovsky@monotype.com>
Date: Mon, 27 Aug 2018 16:30:51 +0000
To: Roderick Sheeter <rsheeter@google.com>
CC: WebFonts WG <public-webfonts-wg@w3.org>
Message-ID: <BYAPR06MB459999A93D4CF12FB5BF52B2FC0B0@BYAPR06MB4599.namprd06.prod.outlook.com>
I think we are both in agreement on what “optimal” means in this demo, and this is, in essence, how Monotype’s dynamic subsetting works for every content update.
My point was that with three consecutive content updates (as is the case in the example I described in my previous email) we would have to produce three different “optimal” dynamic subsets, and while they cumulatively end up transferring less data compared to the current GF solution, the numbers [that showcase the benefits of incremental updates] would speak for themselves. (Right now, when an “optimal” subset size is shown being less than the size of cumulative patches, it may not be as obvious that incremental updates is still a better solution overall.)
To show this, the demo would need to be updated to retain the “optimal” subset size for each content change, and show them as cumulative transfer, similar to what you now show for GF solution where e.g. Latin and Cyrillic character sets are sent as two data blocks.

Makes sense?
Vlad


From: Roderick Sheeter [mailto:rsheeter@google.com]
Sent: Friday, August 24, 2018 5:46 PM
To: Levantovsky, Vladimir
Cc: WebFonts WG
Subject: Re: Incremental transfer subset + binary patch POC

WRT optimal, the way it is now is meant to show the absolute best we could do: if we somehow knew a priori what content the user would view (e.g. their page walk and contents thereof) we could cut a single "perfect" subset that covers all that content. The demo is meant to show that a patch series is a better approximation of that optimal than other current options.

To give another example, the finer we slice Korean or Japanese (https://developers.googleblog.com/2018/04/google-fonts-launches-korean-support.html) the closer we approximate what incremental transfer would do. This doesn't work for anything that makes heavy use of layout, most notably if we cut Arabic or Indic into a bunch of pieces it's utterly broken. Incremental Transfer would "just work" for these cases.

Cheers, Rod S.

On Thu, Aug 23, 2018 at 1:31 PM Levantovsky, Vladimir <Vladimir.Levantovsky@monotype.com<mailto:Vladimir.Levantovsky@monotype.com>> wrote:
Thank you Rod, this is really cool!
I can’t claim I fully understand [yet] the concept behind the Brotli Patch Mode, but the POC demo makes the benefits of incremental transfer quite obvious.

One possible additional selling point might also be to calculate and show the cumulative size of all “optimal” woff2 subsets combined, after each incremental content update. E.g., we start with the demo text and add “HELLO WORLD!” to it, followed by “Проверка” (“Testing” in Russian) – GF today would end up sending two subsets (Latin 21.2 KB + Cyrillic 13.8 KB = 35KB), incremental updates would produce three patches yielding (5.9KB + 1.3KB + 0.8KB =) 8KB of font data, while three “optimal” dynamic subsets would result in transferring (5.9KB + 6.7KB + 7.3KB =) 19.9 KB of data.

Great work!
Thank you,
Vlad


From: Roderick Sheeter [mailto:rsheeter@google.com<mailto:rsheeter@google.com>]
Sent: Monday, August 20, 2018 4:49 PM
To: WebFonts WG
Subject: Incremental transfer subset + binary patch POC

Good afternoon,

I have some good news for incremental transfer: the Google Compression team that brought us Brotli has toys that may help, specifically Brotli Shared Dictionary, and in the future, Brotli Patch Mode. These tools allow us to compute smaller patches than VCDIFF.

Specifically, Brotli allows us to use the current state as a dictionary. This allows the compressed target to refer to the current state. To give a simplified example, instead of storing unchanged bytes just store that you need N bytes from dictionary at offset M. Patch mode will have enhancements to help compress things like identical offset shifts as you might get when compiling code with added/removed functions or similar. This may also be a win for fonts.

So, if the client tells us what codepoints it has and what it needs, then we can:

1) Compute current state.
Hb-subset is fast enough to plausibly do this "live", or precomputation could be used. We should permit the server to respond by patching to any set of codepoints it likes, not exactly what was requested.
2) Compute desired state.
3) Compute patch to get from current=>desired using a public standardized patch algorithm.
4) Send patch to client.
The client can then obtain the augmented font by applying the patch.

I built a proof of concept to let us begin to play around with this, available at https://fonts.gstatic.com/experimental/incxfer_demo<https://urldefense.proofpoint.com/v2/url?u=https-3A__fonts.gstatic.com_experimental_incxfer-5Fdemo&d=DwMFaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=jb2T9D8Np5j0t1X2JtGDVMxJyD5fvLoEPxzRs46vOK4UfGfOrlVsyuleed6YRZk5&m=aNlVXJtTKkRlneuxLUQuXDTw_yjqSReFvlw5gVn_YDM&s=OUjEsoFArQRZp0jNOMm_uLcGHvS3Kgc-mbjQ4OfRC1s&e=>. The demo allows you to add arbitrary text to an initial block and then transfers a font patch using either VCDIFF or Brotli Shared Dictionary. To give context, it also displays the size of a WOFF2 of the exact subset needed (optimal known delivery strategy) and what Google Fonts would transfer today. All subsetting is done with hb-subset, which doesn't support layout yet (coming soon!).

It is my hope that this demonstrates that:

1) We can specify incremental transfer in a way that minimizes client implementation difficulty.
    a. An HTTP interaction, no new protocol.
    b. A generic patch algorithm works fine.
2) The client can avoid needing changes when the font spec changes, it just needs an implementation of the patch algorithm.
If the client keeps Brotli up to date then it'll have one.
3) We don't need client to add any new libraries, just new versions of existing ones.
The client does still need code changes to wire everything together. If patching things doesn't fit the clients model this may be a good chunk of work.

One of the main things we could lose out on in the demo is WOFF2 glyf transformation. However, one could subset then apply the woff2 glyf transformation (for both current and desired). The client would receive the patch, apply it, and then undo the transform.

Cheers, Rod S.
Received on Monday, 27 August 2018 16:31:21 UTC