Re: Dictionary Compression for HTTP (at Facebook)

Fully agree! Sharing dictionaries is an amazing opportunity in making the
internet faster and cheaper. SDCH never exploited that opportunity fully
and it is great that we all are giving another go on this.

I presume Zstd dictionaries are simple:

   - fill *lz77 buffer* with bytes

.... and I know that Shared brotli dictionaries are relatively complex:

   - fill *lz77 buffer* with bytes, or,
   - add special meaning for *unique (distance, length) pairs* (2 % more
   density than filling lz77 buffer with bytes), or,
   - perform a *binary diff* on patch data (makes bsdiff obsolete by
   compressing 5–10 % more than bsdiff+brotli, can by 95+ % more dense than
   traditional lz77 dictionary for patching).
   - when distance overflows for unique (distance, length) pairs, a *customized
   word transform* is applied (gives 2 % more density)
   - *context modeling*: dictionary ordering of interpretation of
   (distance, length) pairs may depend on the last two bytes (unknown gains, I
   anticipate 1 %)

For data like the Google search result pages we can see a reduction of ~50
% in data when we go from "br" Brotli to Shared Brotli, and naturally very
significant latency wins. Having binary diffing within shared dictionary
infrastructure can allow patches for web packaging, Android apps, fonts, or
other complex structured data to be efficiently compressed with shared
dictionary by just using the previous version of that data as a dictionary.



On Wed, Aug 22, 2018 at 2:30 AM, Felix Handte <felixh@fb.com> wrote:

> Hello all,
>
> Quick introduction: I'm an engineer on the Data Compression team at
> Facebook. While we partner with other teams here to apply compression
> internally at Facebook, we primarily maintain the open source
> Zstandard[1][2] and LZ4[3] libraries.
>
> We've seen enormous success leveraging dictionary-based compression with
> Zstd internally, and I'm starting to look at how we can apply the same
> toolkit/approach to compressing our public web traffic. As we're thinking
> about how to do this, both as a significant origin of HTTP traffic and as
> maintainers of open source compression tools, we want very much to pursue a
> course of action that is constructive for the broader community.
>
> There are, by my count, three competing proposals for how this sort of
> thing might work (SDCH[4], Compression Dictionaries for HTTP/2[5], and
> Shared Brotli Dictionaries[6]+[7]). With no public consensus around how to
> do this well, it's tempting for us to simply build on the tooling we've
> built internally, and apply it to our traffic between our webservers and
> our mobile apps (where we control both ends of the connection and can do
> anything we want). However, it would be pretty tragic for Facebook to gin
> up its own spec and implementation in this space, roll it out, and end up
> with something mutually incompatible with anyone else's efforts, further
> fragmenting the community and driving consensus further off.
>
> So I wanted to first resurface this topic with you all. In short, is there
> anyone still interested in pursuing a standard covering these topics? If
> so, I would like to work with you and help build something in this space
> that can actually see adoption.
>
> Thanks,
> Felix
>
> [1] https://github.com/facebook/zstd
> [2] https://tools.ietf.org/html/draft-kucherawy-dispatch-zstd-03
> [3] https://github.com/lz4/lz4
> [4] https://tools.ietf.org/html/draft-lee-sdch-spec-00
> [5] https://tools.ietf.org/html/draft-vkrasnov-h2-compression-di
> ctionaries-03
> [6] https://tools.ietf.org/html/draft-vandevenne-shared-brotli-format-01
> [7] https://github.com/google/brotli/wiki/Fetch-Specification
>
>

Received on Wednesday, 22 August 2018 08:25:42 UTC