RE: Choosing a header compression algorithm from RUELLAN Herve on 2013-03-29 (ietf-http-wg@w3.org from January to March 2013)

From: RUELLAN Herve <Herve.Ruellan@crf.canon.fr>
Date: Fri, 29 Mar 2013 15:26:39 +0000
To: Roberto Peon <grmocg@gmail.com>
CC: "agl@google.com" <agl@google.com>, Mark Nottingham <mnot@mnot.net>, "ietf-http-wg@w3.org Group" <ietf-http-wg@w3.org>
Message-ID: <6C71876BDCCD01488E70A2399529D5E5163F6C3B@ADELE.crf.canon.fr>

> -----Original Message-----
> From: Roberto Peon [mailto:grmocg@gmail.com]
> Sent: jeudi 28 mars 2013 20:17
> To: RUELLAN Herve
> Cc: agl@google.com; Mark Nottingham; ietf-http-wg@w3.org Group
> Subject: Re: Choosing a header compression algorithm
> 
> The net effect of the character-delimited prefix match you propose is that an
> attacker must only brute force each subsequence which matches [^/&=,
> ]*([^/&=, ]|$) ,
> 
> Running the numbers, that is:
> 
> 	k^s * m
> 
> where:
> 
> 	k = possible characters
> 	s = subsequence length or characters not ending in [^/&= ,] or end-
> of-line
> 	m = number of subsequences
> 
> 
> 
> instead of full atom matching, which is (is a good special case of above where
> m is always 1 and s is always n):
> 
> 	k^n
> 
> 
> where:
> 
> 
> 	k = possible characters
> 	n = length of field
> 
> 
> 
> In other words, full atom matching is *exponentially* more difficult (a
> desirable trait)!
> 
> In the example above, assuming 27 commonly used characters per entry, I'm
> very likely to completely discover that data in:
> 
> 	http://www.example.com/path/first/myfile
> 
> in:
> 
> 	27^4 + 27^5 + 27^6 tries

This is roughly 400 million tries. And this figure is obtained using several assumptions that may not hold:
- The number of possible characters is only 27.
- The length of each chunk is known.

> Note that it is likely that an attacker would be able to do substantially better
> than above if they know the letter frequency (very likely) or have similar
> data:
> In the case of a whole atom match, I'd discover the data in:
> 
> 
> 	27^(17) tries
> 
> 
> So, full atom matching is ~ 5 quadrillion (5,353,441,417,462,320) times more
> difficult... that is a lot.

True full atom matching is much harder, but limited prefix matching is already very hard.

We must also keep in mind that each try correspond to a request that the attacker for the client to do. Such a huge number of requests should be detected somewhere, just in order to prevent DoS attacks.

> When I originally considered prefix matching in the past, this was the math
> that I did, and I decided that I was not confident that compressor
> implementors would ensure that their encoder prevents prefix matching
> when the subsequence lengths are too small (and thus easily attacked). The
> same consideration applies to dynamic huffman coding. It *CAN* be safe,
> but it requires complexity and computation, and I think there is significant
> risk that implementors may knowing or unknowingly sacrifice security. It is
> actually more safe to ignore delimiters and allow for a match of a prefix so
> long as it is greater than a certain length. At least in that case there is
> guarantee of the number of times that it must be brute forced...
> 
> Full-atom matching is simply much more difficult to get wrong from a security
> perspective, and it results in pretty decent compression...

I think that prefix matching with constrained ending is fairly secure: it compels an attacker to used brute force to break it. In addition, it provides very interesting results regarding compression.

Hervé.

> -=R

Received on Friday, 29 March 2013 15:27:12 UTC