comments about draft-ietf-httpbis-header-compression from Jyrki Alakuijala on 2014-12-31 (ietf-http-wg@w3.org from October to December 2014)

From: Jyrki Alakuijala <jyrki@google.com>
Date: Wed, 31 Dec 2014 18:01:15 +0100
To: ietf-http-wg@w3.org
Message-ID: <CAPapA7Qfw-YUV85cHsrxANFaDX7hZT89nX4_AJvu2i3ETgN5uA@mail.gmail.com>
Some findings from draft-ietf-httpbis-header-compression

1) If you limit the Huffman code length to somewhere around 15-17 bits, you
can use a 2-level lookup based Huffman coding. This can be faster in
decoding than non-limited methods, and need less memory.

2) The draft creates an impression that deflate is inherently unsafe and
nothing can be done to fix it (in 7.1. last section). However, the deflate
format contains Huffman (both dynamic and static Huffman), and the use of
LZ77 is optional and fully controllable by the compression algorithm, so
obviously deflate as a format is as safe as HPACK. It is just possible to
use it in an unsafe manner. The most typical implementation of deflate,
zlib, is unsafe in this manner, but nothing would stop us from making a
completely safe implementation that would be in agreement with the existing
standards for decoding.

3) I believe that we could compression safe against CRIME attack by
computing a hash and inserting has mod 128--512 bits of extra random bits
based on the hash value before encryption. This way we would not have to
throw away compression when we add safety. If this was done, any small
change of the data would change the length wildly and not reveal secrets.

4) A specially designed implementation of LZ77 can hide a set of substrings
from a CRIME attack. Just marking which bytes containing the substrings
that can be manipulated by the user (or alternatively the secret
substrings), and by compressing those using no backward references from and
to these marked substrings. In the deflate format, it is possible to do
this. Also, it is possible in deflate to selectively use a static Huffman
code for such an area, and continue with a dynamic Huffman afterwards. If
one would use brotli, it would be possible to temporarily switch to another
Huffman code and come back to the original code afterwards.

5) You propose a new way of encoding a static dictionary. Is there any
evidence that the new method is even as good as deflate used to be? Perhaps
we could compare HPACK against deflate and even brotli. I'd think that
inserting a tiny header-dictionary before brotli's own dictionary would
likely be an excellent solution for this problem. Brotli's simple static
dictionary method is about 4 bits more efficient per encoded substring than
what is typically used with deflate as a static dictionary (like zlib's
deflateSetDictionary).

6) I disagree with the need to go for static entropy encoding. This seems
to me as going to the time before 1977 and even to the times before Huffman
coding, it seems like an early 1950s solution. Do we need to do this? I
think the answer is no. I think we are using too big of a hammer to hit
this problem with.

7) I'm also worried about the static entropy coding being static. Building
a static system is stating that we know what the future will be. Looking
back how much HTTP has been abused in the past, predicting what it will be
used in the future is likely very difficult. Even if the benchmarking
numbers are good now, horrible things can happen if there is change. If we
need to support lots of utf-8 Chinese or Russian suddenly in the headers.
Perhaps headers will have more base64 codes, longer cookies, etc. Even if
the statistics change a little, the static entropy encoding will not match
to that, and inefficiency will shorten the life-time of the proposed
method.

8) If we want to sabotage flexibility, we should understand what is being
thrown away and benchmark this against an algorithm that can do a tiny bit
of context modeling for utf-8.

9) I don't want to see a new compression algorithm in the browser unless
there is evidence that it is a lot better than one of the existing
algorithms. I didn't see the evidence yet for this algorithm. Particularly,
I'd like to propose that we should benchmark the proposed algorithm against
both deflate and brotli with a similar static dictionary that is used with
the proposed algorithm. Being able to do this with brotli or deflate would
lead to an overall reduction of complexity in the browser, and likely give
us better performance.
Received on Wednesday, 31 December 2014 17:01:42 UTC