Re: Speeding up Brotli Patch Generation w/ Custom Encoding from Garret Rieger on 2022-09-22 (public-webfonts-wg@w3.org from September 2022)

From: Garret Rieger <grieger@google.com>
Date: Thu, 22 Sep 2022 11:49:11 -0600
To: Skef Iterum <siterum@adobe.com>
Cc: "w3c-webfonts-wg (public-webfonts-wg@w3.org)" <public-webfonts-wg@w3.org>
Message-ID: <CAM=OCWYn9b_gpi4BEbDFszre=gw2AjfpqC42MQVXLSHTpKoSYA@mail.gmail.com>
Ah, got it. That makes sense. In that case for TT fonts this should be
doable for glyf right now:

   - I recently updated the subset api to expose the subset plan, which
   among other things contains the old to new glyph id mapping, unicode
   codepoint to glyph id mapping, and the set of glyph ids that will be in the
   subset. You can get the plan with:
   https://github.com/harfbuzz/harfbuzz/blob/main/src/hb-subset.h#L179 and
   then execute it to produce the subset.
   - From the plans you can figure out the set of changed glyph ids.
   - Then loca is pretty trivial to decode to produce the byte ranges for
   each glyph in the glyf table.
   - Lastly you just need to figure out the offset to the start of the glyf
   table. I just talked to Behdad about this and he had a good idea: since the
   blobs that the hb face API returns are pointers into the font's actual data
   you can figure out the table offsets by looking at the data pointer from
   the blob minus the pointer to the start of the font's data. If you're doing
   this with the face that came from a subset call you'll need to convert it
   to a blob first to trigger serialization and table directory generation.
   Then convert the blob to a new hb_face_t.

Beyond glyphs the subset plan has data on other things such as the lookups,
features, scripts that are going to be retained by the subset. That's not
currently publicly exposed but would be pretty straightforward to add to
the API. For a complete list of the info the plan has see:
https://github.com/harfbuzz/harfbuzz/blob/main/src/hb-subset-plan.hh#L92.
Just let me know if you need access to any of those fields from the public
API and I can look into adding methods as needed.

For the offset based tables (ie. COLR, GSUB/GPOS) I can imagine we might
build something that walks the subtree associated with a lookup or glyph
and produces bounds on where the associated data sits.


On Wed, Sep 21, 2022 at 11:29 PM Skef Iterum <siterum@adobe.com> wrote:

> Table-level offsets are probably enough for very small tables and perhaps
> a few "stable" tables but won't help much when it comes to GPOS, CFF, and
> the like. To have significant patch speed-up one will need data on the
> scale of lookups and glyphs, so that you're only ever diffing small (< 1k)
> regions (with a few possible exceptions).
>
> Any prototype that accomplishes this would be intrusive enough that
> present API-level questions are probably moot. No need to bother Behdad at
> this point, other than maybe letting him know that we're looking at it.
>
> Skef
> ------------------------------
> *From:* Garret Rieger <grieger@google.com>
> *Sent:* Wednesday, September 21, 2022 5:05 PM
> *To:* Skef Iterum <siterum@adobe.com>
> *Cc:* w3c-webfonts-wg (public-webfonts-wg@w3.org) <
> public-webfonts-wg@w3.org>
> *Subject:* Re: Speeding up Brotli Patch Generation w/ Custom Encoding
>
>
> *EXTERNAL: Use caution when clicking on links or opening attachments.*
>
>
> That's great to hear re: VCDIFF. The info you're looking from should be
> readily available internally in harfbuzz (it's the same info encoded in the
> table directory of the two fonts). Currently the public api for hb_face_t
> supports getting the tag list, and the byte size of each table by tag but
> you're left to calculate the offsets on your own. I'll be meeting up with
> Behdad tomorrow so I'll float the idea of adding an additional public API
> method to hb-face.h which would give direct access to the table directory
> records so you can grab the offsets as well.
>
> One minor bit about how the subsetting api works: the subsetting api
> initially returns an hb_face_t object. Internally it's actually what we
> call a face builder which has a hashmap from table tag to table bytes. The
> fully serialized font + table directory doesn't get created until you turn
> the face into a blob. This means that if you're planning on doing table to
> table diffs it'll be slightly more efficient to diff from hb_face_t to
> hb_face_t instead of blob to blob. As face to face diff will skip the step
> of serializing into the final single font blob.
>
> On the brotli side of things I successfully built a prototype that can
> create an initial response where the layout tables (GSUB/GPOS/GDEF) are
> precompressed:
> https://github.com/garretrieger/patch-subset-incxfer/blob/custom_brotli/util/precompress-test.cc
> <https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fgarretrieger%2Fpatch-subset-incxfer%2Fblob%2Fcustom_brotli%2Futil%2Fprecompress-test.cc&data=05%7C01%7Csiterum%40adobe.com%7Cd8943cddc0ab4b03f9e608da9c2e37b8%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637994019567423231%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=MboWQttQLKWfzOWb5tRNu%2Brxz6PT8V1jd2caOjrvXGI%3D&reserved=0>
>
> I used this to run a test of producing a initial response for Roboto
> subsetted to all ASCII codepoints where:
>
>    1. Layout tables are immutable and precompressed at max quality (11).
>    2. Layout tables are immutable and dynamically compressed.
>    3. Layout tables are mutable and dynamically compressed.
>
> Across a range of brotli qualities (1-9). The results of that can be found
> here:
> https://docs.google.com/spreadsheets/d/1Xf2KY4sbR_mSXDRJdO7mKKEBPSEBPmjCYxeqhvLOe7c/edit?usp=sharing
> <https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.google.com%2Fspreadsheets%2Fd%2F1Xf2KY4sbR_mSXDRJdO7mKKEBPSEBPmjCYxeqhvLOe7c%2Fedit%3Fusp%3Dsharing&data=05%7C01%7Csiterum%40adobe.com%7Cd8943cddc0ab4b03f9e608da9c2e37b8%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637994019567423231%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=1Fek2HUiqXhCWLBO%2FTAa4lxodrYT1fTReNU3obRYB%2Fo%3D&reserved=0>.
> Note: the times include both subsetting and brotli compression times. This
> was run on my desktop.
>
> There's a couple notable takeaways:
>
>    - Brotli quality: 5-8 seems to be a sweet spot offering very fast
>    compression times while still being competitive in compressed size vs
>    quality 9.
>    - Precompressed layout is slightly faster than mutable layout (1.92 ms
>    per request  2.29 ms per request @ quality 5) but adds about 10kb to the
>    initial request. I would expect most of that extra size to be made back in
>    reduced sizes of subsequent patches due to not having to patch layout
>    tables. Also future patches will be cheaper to compute. Some further
>    testing of the size of subsequent patches will be needed but this is likely
>    a reasonable tradeoff to make.
>
> I'm planning on running this analysis over some more fonts of different
> types (eg. CJK, Emoji, Arabic) to get some additional data. Beyond that I
> plan to also do an analysis of the performance of generating patch
> responses for the mutable layout vs precompressed immutable layout.
>
> On Mon, Sep 19, 2022 at 4:56 PM Skef Iterum <siterum@adobe.com> wrote:
>
> I'm also starting to look at some potentially related issues.
>
> The required patch format is based on VCDIFF. I hadn't peeked at that
> format before but one of its main features is that it's agnostic to how the
> "windows" of the "source file" (for us, the previous subset) are mapped to
> the "target file" (the next subset) -- that's left up to the generator.
>
> This is great for subsetting because we already know a good mapping: The
> window from the previous subset used as a source for the next subset should
> be the region generated from the same portion of the original, complete
> font file.
>
> Accordingly, suppose when harfbuzz is generating a subset it additionally
> generates a file with a sequence of these records, in output file order:
>
>    1. Offset in subset file
>    2. Length in subset file
>    3. Offset in original file
>    4. Length in original file
>    5. (Optional:) "Final" flag, meaning that the portion won't change in
>    future subsets (either because of gid stability or because the GIDs have
>    the same mapping as the source file and can't change further).
>
> With this info from the previous and next subset we can efficiently map
> the windows and therefore can diff only "within" those windows, instead of
> analyzing the two files to discover them. (With an accurate "Final" flag we
> can also skip the comparison and just copy those regions that are final in
> both subsets.) The cost will be the loss of additional, incidental file
> similarities but those should be relatively modest and somewhat addressed
> by compression.
>
> Happily, this solution would also be compatible with the current required
> format. There are some details, such as combining sequential VCDIFF windows
> after diffing to lower their overhead, and potentially dealing with
> different patterns of original offset, length pairs, but those seem
> tractable.
>
> Of course, this does require some additional harfbuzz functionality, which
> is what I'm starting to look into.
>
> Skef
> ------------------------------
> *From:* Garret Rieger <grieger@google.com>
> *Sent:* Monday, September 19, 2022 3:06 PM
> *To:* w3c-webfonts-wg (public-webfonts-wg@w3.org) <
> public-webfonts-wg@w3.org>
> *Subject:* Speeding up Brotli Patch Generation w/ Custom Encoding
>
>
> *EXTERNAL: Use caution when clicking on links or opening attachments.*
>
>
> After our discussions at TPAC I've been thinking about how we could speed
> up brotli patch generation by implementing custom brotli encoding logic
> (thanks Sergey and Dominik for the idea). It turns out that internally
> brotli uses the concept of a meta block
> <https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdatatracker.ietf.org%2Fdoc%2Fhtml%2Frfc7932%23section-2&data=05%7C01%7Csiterum%40adobe.com%7Cd8943cddc0ab4b03f9e608da9c2e37b8%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637994019567423231%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=VO5%2F2C1YlvPx5FL9j4qiiqDtIb9CbMxjOvFThZeCT%2FY%3D&reserved=0>,
> which can optionally contain a self contained block of compressed data
> <https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdatatracker.ietf.org%2Fdoc%2Fhtml%2Frfc7932%23section-11.3&data=05%7C01%7Csiterum%40adobe.com%7Cd8943cddc0ab4b03f9e608da9c2e37b8%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637994019567423231%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=9wxCG7R%2FXdHruzyb4u3Yymp4SVhqQ4x1VwBOYAnorAQ%3D&reserved=0>.
> This is great news as that then allows a server to easily incorporate
> precompressed data blocks into the generation of the brotli compressed
> stream.
>
> I've started to document my thoughts on how we could actually apply this
> in a server implementation here: Speeding up Incremental Transfer Patch
> Generation
> <https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.google.com%2Fdocument%2Fd%2F1MdtB_WPC2grAx3vFgLHA1-CqRzQtWj_cJYx40W2VQgI%2Fedit%3Fusp%3Dsharing&data=05%7C01%7Csiterum%40adobe.com%7Cd8943cddc0ab4b03f9e608da9c2e37b8%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637994019567423231%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=ZHPUIaSirfA2jRPxDsc5fDuyM6ft0Tg7w1KPdTa5s7o%3D&reserved=0>.
> The doc is still a work in progress, but the main ideas are there.
>
> At a high level I believe we should be able to:
>
>    - Cache precompressed immutable segments of the original font for use
>    during initial response generation.
>    - Cache precompressed blocks of glyph data to allow for fast
>    compressed patch generation (for the first and subsequent requests) via
>    concatenation.
>
> Where run-time performance is a concern this should allow for
> significantly reduced computational requirements at the cost of potentially
> larger patches. The best part is that none of this would require any
> specification changes, this can all be done within the existing brotli
> patch mechanism. So this could eliminate the need for the addition of a
> third, new, patching format as discussed at the last meeting.
>
> As next steps I'm looking into building a simplistic prototype to
> demonstrate the idea works and gather some initial data on performance and
> patch sizes.
>
>
Received on Thursday, 22 September 2022 17:49:43 UTC