Re: Speeding up Brotli Patch Generation w/ Custom Encoding from Skef Iterum on 2022-09-22 (public-webfonts-wg@w3.org from September 2022)

From: Skef Iterum <siterum@adobe.com>
Date: Thu, 22 Sep 2022 05:29:16 +0000
To: Garret Rieger <grieger@google.com>
CC: "w3c-webfonts-wg (public-webfonts-wg@w3.org)" <public-webfonts-wg@w3.org>
Message-ID: <SN4PR0201MB87413B12B88A876D63615220B94E9@SN4PR0201MB8741.namprd02.prod.outlook.>
Table-level offsets are probably enough for very small tables and perhaps a few "stable" tables but won't help much when it comes to GPOS, CFF, and the like. To have significant patch speed-up one will need data on the scale of lookups and glyphs, so that you're only ever diffing small (< 1k) regions (with a few possible exceptions).

Any prototype that accomplishes this would be intrusive enough that present API-level questions are probably moot. No need to bother Behdad at this point, other than maybe letting him know that we're looking at it.

Skef
________________________________
From: Garret Rieger <grieger@google.com>
Sent: Wednesday, September 21, 2022 5:05 PM
To: Skef Iterum <siterum@adobe.com>
Cc: w3c-webfonts-wg (public-webfonts-wg@w3.org) <public-webfonts-wg@w3.org>
Subject: Re: Speeding up Brotli Patch Generation w/ Custom Encoding


EXTERNAL: Use caution when clicking on links or opening attachments.


That's great to hear re: VCDIFF. The info you're looking from should be readily available internally in harfbuzz (it's the same info encoded in the table directory of the two fonts). Currently the public api for hb_face_t supports getting the tag list, and the byte size of each table by tag but you're left to calculate the offsets on your own. I'll be meeting up with Behdad tomorrow so I'll float the idea of adding an additional public API method to hb-face.h which would give direct access to the table directory records so you can grab the offsets as well.

One minor bit about how the subsetting api works: the subsetting api initially returns an hb_face_t object. Internally it's actually what we call a face builder which has a hashmap from table tag to table bytes. The fully serialized font + table directory doesn't get created until you turn the face into a blob. This means that if you're planning on doing table to table diffs it'll be slightly more efficient to diff from hb_face_t to hb_face_t instead of blob to blob. As face to face diff will skip the step of serializing into the final single font blob.

On the brotli side of things I successfully built a prototype that can create an initial response where the layout tables (GSUB/GPOS/GDEF) are precompressed: https://github.com/garretrieger/patch-subset-incxfer/blob/custom_brotli/util/precompress-test.cc<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fgarretrieger%2Fpatch-subset-incxfer%2Fblob%2Fcustom_brotli%2Futil%2Fprecompress-test.cc&data=05%7C01%7Csiterum%40adobe.com%7Cd8943cddc0ab4b03f9e608da9c2e37b8%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637994019567423231%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=MboWQttQLKWfzOWb5tRNu%2Brxz6PT8V1jd2caOjrvXGI%3D&reserved=0>

I used this to run a test of producing a initial response for Roboto subsetted to all ASCII codepoints where:

  1.  Layout tables are immutable and precompressed at max quality (11).
  2.  Layout tables are immutable and dynamically compressed.
  3.  Layout tables are mutable and dynamically compressed.

Across a range of brotli qualities (1-9). The results of that can be found here: https://docs.google.com/spreadsheets/d/1Xf2KY4sbR_mSXDRJdO7mKKEBPSEBPmjCYxeqhvLOe7c/edit?usp=sharing<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.google.com%2Fspreadsheets%2Fd%2F1Xf2KY4sbR_mSXDRJdO7mKKEBPSEBPmjCYxeqhvLOe7c%2Fedit%3Fusp%3Dsharing&data=05%7C01%7Csiterum%40adobe.com%7Cd8943cddc0ab4b03f9e608da9c2e37b8%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637994019567423231%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=1Fek2HUiqXhCWLBO%2FTAa4lxodrYT1fTReNU3obRYB%2Fo%3D&reserved=0>. Note: the times include both subsetting and brotli compression times. This was run on my desktop.

There's a couple notable takeaways:

  *   Brotli quality: 5-8 seems to be a sweet spot offering very fast compression times while still being competitive in compressed size vs quality 9.
  *   Precompressed layout is slightly faster than mutable layout (1.92 ms per request  2.29 ms per request @ quality 5) but adds about 10kb to the initial request. I would expect most of that extra size to be made back in reduced sizes of subsequent patches due to not having to patch layout tables. Also future patches will be cheaper to compute. Some further testing of the size of subsequent patches will be needed but this is likely a reasonable tradeoff to make.

I'm planning on running this analysis over some more fonts of different types (eg. CJK, Emoji, Arabic) to get some additional data. Beyond that I plan to also do an analysis of the performance of generating patch responses for the mutable layout vs precompressed immutable layout.

On Mon, Sep 19, 2022 at 4:56 PM Skef Iterum <siterum@adobe.com<mailto:siterum@adobe.com>> wrote:
I'm also starting to look at some potentially related issues.

The required patch format is based on VCDIFF. I hadn't peeked at that format before but one of its main features is that it's agnostic to how the "windows" of the "source file" (for us, the previous subset) are mapped to the "target file" (the next subset) -- that's left up to the generator.

This is great for subsetting because we already know a good mapping: The window from the previous subset used as a source for the next subset should be the region generated from the same portion of the original, complete font file.

Accordingly, suppose when harfbuzz is generating a subset it additionally generates a file with a sequence of these records, in output file order:

  1.  Offset in subset file
  2.  Length in subset file
  3.  Offset in original file
  4.  Length in original file
  5.  (Optional:) "Final" flag, meaning that the portion won't change in future subsets (either because of gid stability or because the GIDs have the same mapping as the source file and can't change further).

With this info from the previous and next subset we can efficiently map the windows and therefore can diff only "within" those windows, instead of analyzing the two files to discover them. (With an accurate "Final" flag we can also skip the comparison and just copy those regions that are final in both subsets.) The cost will be the loss of additional, incidental file similarities but those should be relatively modest and somewhat addressed by compression.

Happily, this solution would also be compatible with the current required format. There are some details, such as combining sequential VCDIFF windows after diffing to lower their overhead, and potentially dealing with different patterns of original offset, length pairs, but those seem tractable.

Of course, this does require some additional harfbuzz functionality, which is what I'm starting to look into.

Skef
________________________________
From: Garret Rieger <grieger@google.com<mailto:grieger@google.com>>
Sent: Monday, September 19, 2022 3:06 PM
To: w3c-webfonts-wg (public-webfonts-wg@w3.org<mailto:public-webfonts-wg@w3.org>) <public-webfonts-wg@w3.org<mailto:public-webfonts-wg@w3.org>>
Subject: Speeding up Brotli Patch Generation w/ Custom Encoding


EXTERNAL: Use caution when clicking on links or opening attachments.


After our discussions at TPAC I've been thinking about how we could speed up brotli patch generation by implementing custom brotli encoding logic (thanks Sergey and Dominik for the idea). It turns out that internally brotli uses the concept of a meta block<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdatatracker.ietf.org%2Fdoc%2Fhtml%2Frfc7932%23section-2&data=05%7C01%7Csiterum%40adobe.com%7Cd8943cddc0ab4b03f9e608da9c2e37b8%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637994019567423231%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=VO5%2F2C1YlvPx5FL9j4qiiqDtIb9CbMxjOvFThZeCT%2FY%3D&reserved=0>, which can optionally contain a self contained block of compressed data<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdatatracker.ietf.org%2Fdoc%2Fhtml%2Frfc7932%23section-11.3&data=05%7C01%7Csiterum%40adobe.com%7Cd8943cddc0ab4b03f9e608da9c2e37b8%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637994019567423231%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=9wxCG7R%2FXdHruzyb4u3Yymp4SVhqQ4x1VwBOYAnorAQ%3D&reserved=0>. This is great news as that then allows a server to easily incorporate precompressed data blocks into the generation of the brotli compressed stream.

I've started to document my thoughts on how we could actually apply this in a server implementation here: Speeding up Incremental Transfer Patch Generation<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.google.com%2Fdocument%2Fd%2F1MdtB_WPC2grAx3vFgLHA1-CqRzQtWj_cJYx40W2VQgI%2Fedit%3Fusp%3Dsharing&data=05%7C01%7Csiterum%40adobe.com%7Cd8943cddc0ab4b03f9e608da9c2e37b8%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637994019567423231%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=ZHPUIaSirfA2jRPxDsc5fDuyM6ft0Tg7w1KPdTa5s7o%3D&reserved=0>. The doc is still a work in progress, but the main ideas are there.

At a high level I believe we should be able to:

  *   Cache precompressed immutable segments of the original font for use during initial response generation.
  *   Cache precompressed blocks of glyph data to allow for fast compressed patch generation (for the first and subsequent requests) via concatenation.

Where run-time performance is a concern this should allow for significantly reduced computational requirements at the cost of potentially larger patches. The best part is that none of this would require any specification changes, this can all be done within the existing brotli patch mechanism. So this could eliminate the need for the addition of a third, new, patching format as discussed at the last meeting.

As next steps I'm looking into building a simplistic prototype to demonstrate the idea works and gather some initial data on performance and patch sizes.
Received on Thursday, 22 September 2022 05:29:32 UTC