Re: Speeding up Brotli Patch Generation w/ Custom Encoding from Garret Rieger on 2022-09-22 (public-webfonts-wg@w3.org from September 2022)

From: Garret Rieger <grieger@google.com>
Date: Wed, 21 Sep 2022 18:05:36 -0600
To: Skef Iterum <siterum@adobe.com>
Cc: "w3c-webfonts-wg (public-webfonts-wg@w3.org)" <public-webfonts-wg@w3.org>
Message-ID: <CAM=OCWbtYysB1uPzb+kDFAjzjRqEyHnbnYp2mmzYrJdk_eOyFg@mail.gmail.com>
That's great to hear re: VCDIFF. The info you're looking from should be
readily available internally in harfbuzz (it's the same info encoded in the
table directory of the two fonts). Currently the public api for hb_face_t
supports getting the tag list, and the byte size of each table by tag but
you're left to calculate the offsets on your own. I'll be meeting up with
Behdad tomorrow so I'll float the idea of adding an additional public API
method to hb-face.h which would give direct access to the table directory
records so you can grab the offsets as well.

One minor bit about how the subsetting api works: the subsetting api
initially returns an hb_face_t object. Internally it's actually what we
call a face builder which has a hashmap from table tag to table bytes. The
fully serialized font + table directory doesn't get created until you turn
the face into a blob. This means that if you're planning on doing table to
table diffs it'll be slightly more efficient to diff from hb_face_t to
hb_face_t instead of blob to blob. As face to face diff will skip the step
of serializing into the final single font blob.

On the brotli side of things I successfully built a prototype that can
create an initial response where the layout tables (GSUB/GPOS/GDEF) are
precompressed:
https://github.com/garretrieger/patch-subset-incxfer/blob/custom_brotli/util/precompress-test.cc

I used this to run a test of producing a initial response for Roboto
subsetted to all ASCII codepoints where:

   1. Layout tables are immutable and precompressed at max quality (11).
   2. Layout tables are immutable and dynamically compressed.
   3. Layout tables are mutable and dynamically compressed.

Across a range of brotli qualities (1-9). The results of that can be found
here:
https://docs.google.com/spreadsheets/d/1Xf2KY4sbR_mSXDRJdO7mKKEBPSEBPmjCYxeqhvLOe7c/edit?usp=sharing.
Note: the times include both subsetting and brotli compression times. This
was run on my desktop.

There's a couple notable takeaways:

   - Brotli quality: 5-8 seems to be a sweet spot offering very fast
   compression times while still being competitive in compressed size vs
   quality 9.
   - Precompressed layout is slightly faster than mutable layout (1.92 ms
   per request  2.29 ms per request @ quality 5) but adds about 10kb to the
   initial request. I would expect most of that extra size to be made back in
   reduced sizes of subsequent patches due to not having to patch layout
   tables. Also future patches will be cheaper to compute. Some further
   testing of the size of subsequent patches will be needed but this is likely
   a reasonable tradeoff to make.

I'm planning on running this analysis over some more fonts of different
types (eg. CJK, Emoji, Arabic) to get some additional data. Beyond that I
plan to also do an analysis of the performance of generating patch
responses for the mutable layout vs precompressed immutable layout.

On Mon, Sep 19, 2022 at 4:56 PM Skef Iterum <siterum@adobe.com> wrote:

> I'm also starting to look at some potentially related issues.
>
> The required patch format is based on VCDIFF. I hadn't peeked at that
> format before but one of its main features is that it's agnostic to how the
> "windows" of the "source file" (for us, the previous subset) are mapped to
> the "target file" (the next subset) -- that's left up to the generator.
>
> This is great for subsetting because we already know a good mapping: The
> window from the previous subset used as a source for the next subset should
> be the region generated from the same portion of the original, complete
> font file.
>
> Accordingly, suppose when harfbuzz is generating a subset it additionally
> generates a file with a sequence of these records, in output file order:
>
>    1. Offset in subset file
>    2. Length in subset file
>    3. Offset in original file
>    4. Length in original file
>    5. (Optional:) "Final" flag, meaning that the portion won't change in
>    future subsets (either because of gid stability or because the GIDs have
>    the same mapping as the source file and can't change further).
>
> With this info from the previous and next subset we can efficiently map
> the windows and therefore can diff only "within" those windows, instead of
> analyzing the two files to discover them. (With an accurate "Final" flag we
> can also skip the comparison and just copy those regions that are final in
> both subsets.) The cost will be the loss of additional, incidental file
> similarities but those should be relatively modest and somewhat addressed
> by compression.
>
> Happily, this solution would also be compatible with the current required
> format. There are some details, such as combining sequential VCDIFF windows
> after diffing to lower their overhead, and potentially dealing with
> different patterns of original offset, length pairs, but those seem
> tractable.
>
> Of course, this does require some additional harfbuzz functionality, which
> is what I'm starting to look into.
>
> Skef
> ------------------------------
> *From:* Garret Rieger <grieger@google.com>
> *Sent:* Monday, September 19, 2022 3:06 PM
> *To:* w3c-webfonts-wg (public-webfonts-wg@w3.org) <
> public-webfonts-wg@w3.org>
> *Subject:* Speeding up Brotli Patch Generation w/ Custom Encoding
>
>
> *EXTERNAL: Use caution when clicking on links or opening attachments.*
>
>
> After our discussions at TPAC I've been thinking about how we could speed
> up brotli patch generation by implementing custom brotli encoding logic
> (thanks Sergey and Dominik for the idea). It turns out that internally
> brotli uses the concept of a meta block
> <https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdatatracker.ietf.org%2Fdoc%2Fhtml%2Frfc7932%23section-2&data=05%7C01%7Csiterum%40adobe.com%7Ce6303c9f8f46471db99b08da9a8b4d09%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637992220331735484%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=6iON2ADk4Vyo5ac8%2B9wCTFUpz3P%2B%2FsmEzhSHV%2BSOU9U%3D&reserved=0>,
> which can optionally contain a self contained block of compressed data
> <https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdatatracker.ietf.org%2Fdoc%2Fhtml%2Frfc7932%23section-11.3&data=05%7C01%7Csiterum%40adobe.com%7Ce6303c9f8f46471db99b08da9a8b4d09%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637992220331735484%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=Z30%2F5xptlCEai3yPGWTWBYDj10oYmVDdlXFaFsuLmJo%3D&reserved=0>.
> This is great news as that then allows a server to easily incorporate
> precompressed data blocks into the generation of the brotli compressed
> stream.
>
> I've started to document my thoughts on how we could actually apply this
> in a server implementation here: Speeding up Incremental Transfer Patch
> Generation
> <https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.google.com%2Fdocument%2Fd%2F1MdtB_WPC2grAx3vFgLHA1-CqRzQtWj_cJYx40W2VQgI%2Fedit%3Fusp%3Dsharing&data=05%7C01%7Csiterum%40adobe.com%7Ce6303c9f8f46471db99b08da9a8b4d09%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637992220331735484%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=w2DaBOZMpOAjriqEzjGcO1wPe9xwowsNFJ75nJY93mE%3D&reserved=0>.
> The doc is still a work in progress, but the main ideas are there.
>
> At a high level I believe we should be able to:
>
>    - Cache precompressed immutable segments of the original font for use
>    during initial response generation.
>    - Cache precompressed blocks of glyph data to allow for fast
>    compressed patch generation (for the first and subsequent requests) via
>    concatenation.
>
> Where run-time performance is a concern this should allow for
> significantly reduced computational requirements at the cost of potentially
> larger patches. The best part is that none of this would require any
> specification changes, this can all be done within the existing brotli
> patch mechanism. So this could eliminate the need for the addition of a
> third, new, patching format as discussed at the last meeting.
>
> As next steps I'm looking into building a simplistic prototype to
> demonstrate the idea works and gather some initial data on performance and
> patch sizes.
>
Received on Thursday, 22 September 2022 00:06:07 UTC