Re: Dictionary Compression for HTTP (at Facebook)

Hello Felix and Jyrki,

Shared dictionary compression in various forms has been discussed in the Working Group for a fair amount of time. What's blocking progress is an agreed-to description of its security properties, issues therein, and acceptable mitigations for them.

Some people have expressed interest in attempting to document that in an Internet-Draft, but to date we haven't seen any progress publicly. If you'd like, I can try to put you in touch with them to see if they need help, etc.

Regards,


P.S. Felix, your e-mail didn't make it to me, or into the archives. Are you subscribed to the list?



> On 22 Aug 2018, at 6:23 pm, Jyrki Alakuijala <jyrki@google.com> wrote:
> 
> Fully agree! Sharing dictionaries is an amazing opportunity in making the internet faster and cheaper. SDCH never exploited that opportunity fully and it is great that we all are giving another go on this. 
> 
> I presume Zstd dictionaries are simple: 
>  • fill lz77 buffer with bytes
> ... and I know that Shared brotli dictionaries are relatively complex: 
>  • fill lz77 buffer with bytes, or, 
>  • add special meaning for unique (distance, length) pairs (2 % more density than filling lz77 buffer with bytes), or,
>  • perform a binary diff on patch data (makes bsdiff obsolete by compressing 5–10 % more than bsdiff+brotli, can by 95+ % more dense than traditional lz77 dictionary for patching).
>  • when distance overflows for unique (distance, length) pairs, a customized word transform is applied (gives 2 % more density)
>  • context modeling: dictionary ordering of interpretation of (distance, length) pairs may depend on the last two bytes (unknown gains, I anticipate 1 %)
> For data like the Google search result pages we can see a reduction of ~50 % in data when we go from "br" Brotli to Shared Brotli, and naturally very significant latency wins. Having binary diffing within shared dictionary infrastructure can allow patches for web packaging, Android apps, fonts, or other complex structured data to be efficiently compressed with shared dictionary by just using the previous version of that data as a dictionary.
> 
> 
> 
> On Wed, Aug 22, 2018 at 2:30 AM, Felix Handte <felixh@fb.com> wrote:
> Hello all,
> 
> Quick introduction: I'm an engineer on the Data Compression team at Facebook. While we partner with other teams here to apply compression internally at Facebook, we primarily maintain the open source Zstandard[1][2] and LZ4[3] libraries.
> 
> We've seen enormous success leveraging dictionary-based compression with Zstd internally, and I'm starting to look at how we can apply the same toolkit/approach to compressing our public web traffic. As we're thinking about how to do this, both as a significant origin of HTTP traffic and as maintainers of open source compression tools, we want very much to pursue a course of action that is constructive for the broader community.
> 
> There are, by my count, three competing proposals for how this sort of thing might work (SDCH[4], Compression Dictionaries for HTTP/2[5], and Shared Brotli Dictionaries[6]+[7]). With no public consensus around how to do this well, it's tempting for us to simply build on the tooling we've built internally, and apply it to our traffic between our webservers and our mobile apps (where we control both ends of the connection and can do anything we want). However, it would be pretty tragic for Facebook to gin up its own spec and implementation in this space, roll it out, and end up with something mutually incompatible with anyone else's efforts, further fragmenting the community and driving consensus further off.
> 
> So I wanted to first resurface this topic with you all. In short, is there anyone still interested in pursuing a standard covering these topics? If so, I would like to work with you and help build something in this space that can actually see adoption.
> 
> Thanks,
> Felix
> 
> [1] https://github.com/facebook/zstd
> [2] https://tools.ietf.org/html/draft-kucherawy-dispatch-zstd-03
> [3] https://github.com/lz4/lz4
> [4] https://tools.ietf.org/html/draft-lee-sdch-spec-00
> [5] https://tools.ietf.org/html/draft-vkrasnov-h2-compression-dictionaries-03
> [6] https://tools.ietf.org/html/draft-vandevenne-shared-brotli-format-01
> [7] https://github.com/google/brotli/wiki/Fetch-Specification
> 
> 

--
Mark Nottingham   https://www.mnot.net/

Received on Thursday, 23 August 2018 06:13:43 UTC