- From: W. Felix Handte <w@felixhandte.com>
- Date: Tue, 29 Oct 2019 22:17:10 -0400
- To: ietf-http-wg@w3.org
On 10/29/19 7:54 PM, Watson Ladd wrote: > I'm not sure I appreciate the distinction of "dictionary-based" > compression vs. other compression algorithms you draw in the draft. > The BREACH attack didn't look at changes to the Huffman table, which > was dominated by good old ETOAIN SHRDLU. Instead it changed the length > of matches back into the datastream, and thus the length of the > observed output. There isn't a separate dictionary to match substrings > in in DEFLATE. Yeah, "dictionary" is an extremely overloaded term in compression, to say nothing of computer science generally. This has produced a great deal of confusion. But I haven't come up with a better term for the concept. What I mean by "dictionary" in this context is mostly a user-supplied buffer that the compressor makes LZ77-style matches into. (Though, as the document notes, various algorithms expect various kinds of data in the dictionaries they accept.) This makes it very much a potential vector for exactly that kind of attack. And DEFLATE, at least as implemented by zlib, does support dictionaries of this form [0]. > A perfect compression algorithm reveals the Kolmogorov complexity of > the input. This is enough (if you can compute Kolmogorov complexity) > to reveal the differences between "hunter2 h" and "hunter2 z", and > then "hunter2 hu" and "hunter2 ha", etc. Right, compression as it exists today has real outstanding security issues. My goal with the document is to assess whether the use of dictionaries introduces additional problems on top of the existing ones. [0] https://github.com/madler/zlib/blob/master/zlib.h#L611
Received on Wednesday, 30 October 2019 02:17:12 UTC