- From: Jyrki Alakuijala <jyrki@google.com>
- Date: Wed, 31 Dec 2014 18:01:15 +0100
- To: ietf-http-wg@w3.org
- Message-ID: <CAPapA7Qfw-YUV85cHsrxANFaDX7hZT89nX4_AJvu2i3ETgN5uA@mail.gmail.com>
Some findings from draft-ietf-httpbis-header-compression 1) If you limit the Huffman code length to somewhere around 15-17 bits, you can use a 2-level lookup based Huffman coding. This can be faster in decoding than non-limited methods, and need less memory. 2) The draft creates an impression that deflate is inherently unsafe and nothing can be done to fix it (in 7.1. last section). However, the deflate format contains Huffman (both dynamic and static Huffman), and the use of LZ77 is optional and fully controllable by the compression algorithm, so obviously deflate as a format is as safe as HPACK. It is just possible to use it in an unsafe manner. The most typical implementation of deflate, zlib, is unsafe in this manner, but nothing would stop us from making a completely safe implementation that would be in agreement with the existing standards for decoding. 3) I believe that we could compression safe against CRIME attack by computing a hash and inserting has mod 128--512 bits of extra random bits based on the hash value before encryption. This way we would not have to throw away compression when we add safety. If this was done, any small change of the data would change the length wildly and not reveal secrets. 4) A specially designed implementation of LZ77 can hide a set of substrings from a CRIME attack. Just marking which bytes containing the substrings that can be manipulated by the user (or alternatively the secret substrings), and by compressing those using no backward references from and to these marked substrings. In the deflate format, it is possible to do this. Also, it is possible in deflate to selectively use a static Huffman code for such an area, and continue with a dynamic Huffman afterwards. If one would use brotli, it would be possible to temporarily switch to another Huffman code and come back to the original code afterwards. 5) You propose a new way of encoding a static dictionary. Is there any evidence that the new method is even as good as deflate used to be? Perhaps we could compare HPACK against deflate and even brotli. I'd think that inserting a tiny header-dictionary before brotli's own dictionary would likely be an excellent solution for this problem. Brotli's simple static dictionary method is about 4 bits more efficient per encoded substring than what is typically used with deflate as a static dictionary (like zlib's deflateSetDictionary). 6) I disagree with the need to go for static entropy encoding. This seems to me as going to the time before 1977 and even to the times before Huffman coding, it seems like an early 1950s solution. Do we need to do this? I think the answer is no. I think we are using too big of a hammer to hit this problem with. 7) I'm also worried about the static entropy coding being static. Building a static system is stating that we know what the future will be. Looking back how much HTTP has been abused in the past, predicting what it will be used in the future is likely very difficult. Even if the benchmarking numbers are good now, horrible things can happen if there is change. If we need to support lots of utf-8 Chinese or Russian suddenly in the headers. Perhaps headers will have more base64 codes, longer cookies, etc. Even if the statistics change a little, the static entropy encoding will not match to that, and inefficiency will shorten the life-time of the proposed method. 8) If we want to sabotage flexibility, we should understand what is being thrown away and benchmark this against an algorithm that can do a tiny bit of context modeling for utf-8. 9) I don't want to see a new compression algorithm in the browser unless there is evidence that it is a lot better than one of the existing algorithms. I didn't see the evidence yet for this algorithm. Particularly, I'd like to propose that we should benchmark the proposed algorithm against both deflate and brotli with a similar static dictionary that is used with the proposed algorithm. Being able to do this with brotli or deflate would lead to an overall reduction of complexity in the browser, and likely give us better performance.
Received on Wednesday, 31 December 2014 17:01:42 UTC