RE: ezflate: proposal to reinstitute deflate header compression from K.Morgan@iaea.org on 2014-06-03 (ietf-http-wg@w3.org from April to June 2014)

From: <K.Morgan@iaea.org>
Date: Tue, 3 Jun 2014 08:08:55 +0000
To: <grmocg@gmail.com>
CC: <ietf-http-wg@w3.org>, <C.Brunhuber@iaea.org>
Message-ID: <0356EBBE092D394F9291DA01E8D28EC201186D995B@sem002pd.sg.iaea.org>

On 02 June 2014 21:58, grmocg@gmail.com<mailto:grmocg@gmail.com> wrote:

> There are a number of issues with this (it is something I considered too :) ):

Perhaps you didn't consider it long enough :)

If you reconsider, we'd love to have your help!

> Any encoder can use gzip, and the decoder will happily decompress it.
> This means that the receiver does not enforce security, which would be a protocol flaw.

We thought about this and didn't have enough time to implement it, but a small mod to the zlib inflate to pass metadata, to the consumer, about the back references would be enough to verify that the sender is conforming to the tokenization rules.

> One can do letter-frequency based attacks since flate (often) uses dynamic huffman tables. This is unsafe when unconstrained, and difficult to analyze.

I haven't come across this attack before. Will you please send me a link to some information? I would like to include it in the "Security Considerations" section.

> One is now compressing within atoms, unless that feature is also removed, in which case efficiency drops.

I'm not sure what you mean by "atoms". I assume you mean the individual tokens of a header value?
If I read between the lines, your implying that this is unsafe?

Consider the following header (with name: value)...
Cookie: c1=ABCDEFGHIJKL;c2=MNOPQRSTUVWX

Tokenized, the value would be split into the following tokens: { "c1=ABCDEFGHIJKL", ";", "c2=MNOPQRSTUVWX" }
An attacker would have to brute-force guess the entire value of each cookie individually. How is that not safe? Assming Base64 encoding, the search space is 64^12=4,722,366,482,869,645,213,696
Of course in hpack the attacker would have to guess both cookies at the same time which hash a search space with a number too big to include here, but doesn't seem to add a lot of benefit.

> Flate has no 'never compress' flag, though one could maybe be added with some effort.

Easily added with little effort.

> Future uses of the protocol will likely adapt to the compression mechanism used, improving the efficiency of something using delta-coding further beyond what it looks like today.

Same is true for any other compression mechanism used, including ezflate.

> Hpack's decoding is extremely cheap.
> Hpack was designed to make it difficult for a conforming implementation to leak information, to make encoding and decoding very fast/cheap, to provide for receiver control over compression context size, to allow for proxy re-indexing (i.e. shared state between frontend and backend withing a proxy), and for quick comparisons of huffman-encoded strings. Several of these goals are difficult to achieve with something flate-based.

Several of these goals could (should IMO) be achieved with the approach that Poul-Henning suggested [1], which is to divide the headers into routing headers and end-to-end headers. The routing headers would use a (very simplified) hpack with a static table and static huffman coding for values - making it very easy/fast for proxies to achieve the goals you mentioned. The end-to-end headers could use something more sophisticated (e.g. ezflate).

-keith

On Mon, Jun 2, 2014 at 12:26 PM, <K.Morgan@iaea.org<mailto:K.Morgan@iaea.org>> wrote:
Sorry, right after I clicked send, I got the links to the “htmlized” versions which are probably easier to read…

A new version of I-D, draft-morgan-ezflate-00.txt has been successfully submitted by Keith Shearl Morgan and posted to the IETF repository.

Name: draft-morgan-ezflate

Revision: 00

Title: EZFLATE: Token-based DEFLATE Compression

Document date: 2014-06-02

Group: Individual Submission

Pages: 10

URL: http://www.ietf.org/internet-drafts/draft-morgan-ezflate-00.txt

Status: https://datatracker.ietf.org/doc/draft-morgan-ezflate/

Htmlized: http://tools.ietf.org/html/draft-morgan-ezflate-00

A new version of I-D, draft-morgan-http2-header-compression-00.txt

has been successfully submitted by Keith Shearl Morgan and posted to the IETF repository.

Name: draft-morgan-http2-header-compression

Revision: 00

Title: H2EZ: HTTP/2 Header Compression

Document date: 2014-06-02

Group: Individual Submission

Pages: 11

URL: http://www.ietf.org/internet-drafts/draft-morgan-http2-header-compression-00.txt

Status: https://datatracker.ietf.org/doc/draft-morgan-http2-header-compression/

Htmlized: http://tools.ietf.org/html/draft-morgan-http2-header-compression-00

On Monday,02 June 2014 21:23, MORGAN, Keith Shearl wrote:

No, we haven't been living under a rock. Yes, we've heard of CRIME.

CRIME exploited the deflate algorithm as described in RFC 1951, not the deflate format. What makes the RFC 1951 algorithm vulnerable to the CRIME attack is that it does character by character matching (normally this is a good thing because it maximizes the number and length of matches!).

The first paragraph of Section 4 of RFC 1951 says that “[implementations] need not follow [the general algorithm presented here] in order to be compliant.”

We propose an alternate deflate algorithm called ezflate (“easy flate”); a token-based deflate algorithm. The key feature of the ezflate algorithm (and primary difference to the RFC 1951 algorithm), is that it does token-by-token matching.

Here is a simple example, to illustrate how it works. Consider the following HTTP/2 requests (abbreviated):

:method: GET

:path: /

accept-encoding: gzip

:method: GET

:path: /cool.html

accept-encoding: gzip, sdch

The tokens “:method”, “GET”, “:path”, “accept-encoding” and “gzip” are compressed in the second request. The tokens “/cool.html” and “sdch” are not compressed.

Now consider another pair of requests (abbreviated):

:method: GET

:path: /awesome.html

secret: ABCDEFGH

:method: GET

:path: /?guess=ABC

secret: ABCDEFGH

The tokens “:method”, “GET”, “:path”, “secret” and “ABCDEFGH” are compressed in the second request. But more importantly, the attacker’s guess of the secret “ABC” does not trigger a match and is impervious to a CRIME-like attack. (Of course, just like hpack, ezflate is vulnerable to a brute-force attack of all 8-character combinations.)

The ezflate algorithm is non-http/2 specific so we divided our Internet Drafts into two parts (the ezflate algorithm itself and the http/2 header compression based on ezflate):

http://www.ietf.org/id/draft-morgan-ezflate-00.txt

http://www.ietf.org/id/draft-morgan-http2-header-compression-00.txt

We have implemented ezflate as an additional “compression strategy” called Z_EZFLATE in the zlib library.

The code is very straightforward. We basically only had to implement one new compression function (in addition to the existing deflate_fast & deflate_slow, we added ezflate). We relied on the rest of the proven zlib infrastructure for windowing, hash chains, Huffman coding, etc.

We used the http2/http_samples [1] har files as test vectors for analysing the compression factor of our ezflate implementation. Here are the data:

pt=plain-text; df=deflate; hp=hpack*; ez=ezflate**

Method avg-bytes/req (% of pt) avg-bytes/rsp (% of pt)

pt 574 (n/a) 407 (n/a)

df 70 (12%) 62 (15%)

hp 91 (16%) 83 (20%)

ez 99 (17%) 77 (19%)

For the data set, hpack is slightly better on requests and ezflate is slightly better on responses, but the difference is negligible and likely depends on the use cases. (Note that requests to multiple hosts were interleaved as if they were all sent to a single proxy.) So changing to ezflate just based on the compression factor is not compelling…

…BUT, there are other benefits of ezflate beyond the raw compression factor...

+ Interoperability is easy; any inflate library (e.g. zlib) will decompress ezflate streams

+ Built on the proven zlib library (our implementation)

+ Easy to implement a custom compressor i.e. less complexity

+ Custom resource-constrained implementations are possible by trading off compression for resources

+ Furthermore, resource-constrained implementations can use non-compressed header blocks or Huffman-only header blocks

+ Allows interleaving of HEADERS frames with other frame types

+ Streaming

+ Useful for any tokenizable data stream (e.g. html) which contains secret and attacker-controlled data (think BREACH)

*hpack was configured with a table-size=4096, the hpack results come from nghttp2 deflatehd

**ezflate was configured with window-bits=12 (4096 window), mem-level=6 (default), level=6 (default -> level is a non-factor for ezflate)

Regards,

Keith & Chris

[1] https://github.com/http2/http_samples

This email message is intended only for the use of the named recipient. Information contained in this email message and its attachments may be privileged, confidential and protected from disclosure. If you are not the intended recipient, please do not read, copy, use or disclose this communication to others. Also please notify the sender by replying to this message and then delete it from your system.

Received on Tuesday, 3 June 2014 08:09:29 UTC