Re: New I-D: Security Considerations Regarding Compression Dictionaries from Watson Ladd on 2019-10-29 (ietf-http-wg@w3.org from October to December 2019)

From: Watson Ladd <watson@cloudflare.com>
Date: Tue, 29 Oct 2019 16:54:12 -0700
To: "W. Felix Handte" <w@felixhandte.com>
Cc: HTTP Working Group <ietf-http-wg@w3.org>
Message-ID: <CAN2QdAGX0vtBSuUBS_HYsoTuTmmO=-LX_w9OizG+v6jqFMtLTA@mail.gmail.com>

On Tue, Oct 29, 2019 at 4:23 PM W. Felix Handte <w@felixhandte.com> wrote:
>
> Hello all,
>
> At IETF 104, I presented a teaser of the exploratory work I've been
> doing into dictionary-based compression for HTTP [0]. At the time, I
> promised that I would follow up with an analysis of the security
> properties of dictionary-based compression.
>
> That time has come! I've just uploaded a draft [1] that attempts to
> address that need and provide a useful survey of the interactions
> between dictionaries, internet protocols, and security.
>
> I would eventually like for this document to find a home in the HTTP WG;
> your feedback and thoughts are greatly appreciated.

I'm not sure I appreciate the distinction of "dictionary-based"
compression vs. other compression algorithms you draw in the draft.
The BREACH attack didn't look at changes to the Huffman table, which
was dominated by good old ETOAIN SHRDLU. Instead it changed the length
of matches back into the datastream, and thus the length of the
observed output. There isn't a separate dictionary to match substrings
in in DEFLATE.

A perfect compression algorithm reveals the Kolmogorov complexity of
the input. This is enough (if you can compute Kolmogorov complexity)
to reveal the differences between "hunter2 h" and "hunter2 z", and
then "hunter2 hu" and "hunter2 ha", etc.

It's true that a static Huffman tree isn't vulnerable to this problem,
but that's because the Huffman tree compresses character by character
using source statistics that don't change as the message is processed.
A dynamic Huffman tree (or range encoder) with only symbols (not per
context) would also leak overall number of symbols, while one with
context dependent probabilities would leak quite a bit more. No
dictionary here!

>
> I look forward to seeing you all in Singapore!
>
> Thanks,
> Felix
>
> [0] https://youtu.be/GIRgsVIYG7I?t=6889
> [1] https://datatracker.ietf.org/doc/draft-handte-httpbis-dict-sec/
>

Received on Tuesday, 29 October 2019 23:54:26 UTC