Re: New I-D: Security Considerations Regarding Compression Dictionaries from W. Felix Handte on 2019-10-30 (ietf-http-wg@w3.org from October to December 2019)

From: W. Felix Handte <w@felixhandte.com>
Date: Tue, 29 Oct 2019 22:17:10 -0400
To: ietf-http-wg@w3.org
Message-ID: <f99d6b86-72af-a019-ae8b-a5673adfc814@felixhandte.com>

On 10/29/19 7:54 PM, Watson Ladd wrote:
> I'm not sure I appreciate the distinction of "dictionary-based"
> compression vs. other compression algorithms you draw in the draft.
> The BREACH attack didn't look at changes to the Huffman table, which
> was dominated by good old ETOAIN SHRDLU. Instead it changed the length
> of matches back into the datastream, and thus the length of the
> observed output. There isn't a separate dictionary to match substrings
> in in DEFLATE.

Yeah, "dictionary" is an extremely overloaded term in compression, to
say nothing of computer science generally. This has produced a great
deal of confusion. But I haven't come up with a better term for the concept.

What I mean by "dictionary" in this context is mostly a user-supplied
buffer that the compressor makes LZ77-style matches into. (Though, as
the document notes, various algorithms expect various kinds of data in
the dictionaries they accept.) This makes it very much a potential
vector for exactly that kind of attack. And DEFLATE, at least as
implemented by zlib, does support dictionaries of this form [0].

> A perfect compression algorithm reveals the Kolmogorov complexity of
> the input. This is enough (if you can compute Kolmogorov complexity)
> to reveal the differences between "hunter2 h" and "hunter2 z", and
> then "hunter2 hu" and "hunter2 ha", etc.

Right, compression as it exists today has real outstanding security
issues. My goal with the document is to assess whether the use of
dictionaries introduces additional problems on top of the existing ones.

[0] https://github.com/madler/zlib/blob/master/zlib.h#L611

Received on Wednesday, 30 October 2019 02:17:12 UTC