Re: New Version Notification for draft-vkrasnov-h2-compression-dictionaries-01.txt from Vlad Krasnov on 2016-11-02 (ietf-http-wg@w3.org from October to December 2016)

From: Vlad Krasnov <vlad@cloudflare.com>
Date: Wed, 2 Nov 2016 14:19:39 -0700
To: Jyrki Alakuijala <jyrki@google.com>
Cc: HTTP Working Group <ietf-http-wg@w3.org>
Message-Id: <7DE838F0-916A-4F92-9631-2C0C1073AFF6@cloudflare.com>

> Brotli has two separate ways of using a static dictionary. The first way is the traditional way that zlib supports. The current custom dictionary interface in brotli supports this method.

I tested Brotli with the traditional approach for both static and dynamic dictionaries, and the gains are much higher than those of deflate, I guess thanks in part to the much larger window.

For my current study on static dictionaries I generated dictionaries using a tool I wrote https://github.com/vkrasnov/dictator <https://github.com/vkrasnov/dictator> from the Alexa top 500, based on content-encoding. It is optimized for deflate, and there are things to improve there yet.

A bug plus it that the same dictionary is beneficial for both deflate and Brotli, or even LZMA as a matter of fact.

> The second way is denser. It allows for every pair of (length, distance) to point to a unique dictionary sequence. Because of this, dictionary sequences that point to length N strings would save log2(N) bits in distance specification in comparison to traditional dictionaries.

The second way is more expensive in terms of performance, but I suppose if you can generate static dictionaries only once, you only need to consider the cost of dictionary lookup.

The larger problem is that then we will have to support different dictionaries for different algorithms. We can do it, if like Martin suggested we have a versioning system for the dictionaries.

Another question then: do we support simultaneous use of dynamic and static dictionaries, which would only work with Brotli?

> The second way allows for about 3 % increase in compression density in comparison to the first way, or alternatively one can reach to same compression density by using smaller dictionaries (possibly about half the size).

My main goal here is to allow for efficient recompression on the fly even for static, or previously compressed, content. Using dynamic dictionaries, lets you compress/re-compress very well with lower compression setting, really fast.

For example if you already have a stream compressed with gzip, is it worth it for you to recompress it to brotli? When you use dynamic dictionaries (with the simple static dictionaries) the answer is definitely yes.

Received on Wednesday, 2 November 2016 21:20:14 UTC