Re: Choosing a header compression algorithm from Roberto Peon on 2013-03-28 (ietf-http-wg@w3.org from January to March 2013)

From: Roberto Peon <grmocg@gmail.com>
Date: Thu, 28 Mar 2013 15:19:16 -0400
To: RUELLAN Herve <Herve.Ruellan@crf.canon.fr>
Cc: "agl@google.com" <agl@google.com>, Mark Nottingham <mnot@mnot.net>, "ietf-http-wg@w3.org Group" <ietf-http-wg@w3.org>
Message-ID: <CAP+FsNfAns1G7EBdG0L5HEkRJW2-PMLGY4tOnN4Nj=me9rEw6g@mail.gmail.com>
Oops, the math is a bit wrong:
k^s * m
it is k^s * m * e
where e == number of possible terminals, but it really doesn't make any
significant difference in the expected outcome given that the attacker is
likely to know which terminals to use much of the time.
-=R


On Thu, Mar 28, 2013 at 3:16 PM, Roberto Peon <grmocg@gmail.com> wrote:

> The net effect of the character-delimited prefix match you propose is that
> an attacker must only brute force each subsequence which matches [^/&=,
> ]*([^/&=, ]|$) ,
>
> Running the numbers, that is:
>
> k^s * m
>
> where:
>
> k = possible characters
> s = subsequence length or characters not ending in [^/&= ,] or end-of-line
> m = number of subsequences
>
>
>
> instead of full atom matching, which is (is a good special case of above
> where m is always 1 and s is always n):
>
> k^n
>
> where:
>
> k = possible characters
> n = length of field
>
>
>
> In other words, full atom matching is *exponentially* more difficult (a
> desirable trait)!
>
> In the example above, assuming 27 commonly used characters per entry, I'm
> very likely to completely discover that data in:
>
> http://www.example.com/path/first/myfile
>
> in:
>
> 27^4 + 27^5 + 27^6 tries
>
>
> Note that it is likely that an attacker would be able to do substantially
> better than above if they know the letter frequency (very likely) or have
> similar data:
> In the case of a whole atom match, I'd discover the data in:
>
> 27^(17) tries
>
>
> So, full atom matching is ~ 5 quadrillion (5,353,441,417,462,320) times
> more difficult... that is a lot.
>
>
> When I originally considered prefix matching in the past, this was the
> math that I did, and I decided that I was not confident that compressor
> implementors would ensure that their encoder prevents prefix matching when
> the subsequence lengths are too small (and thus easily attacked). The same
> consideration applies to dynamic huffman coding. It *CAN* be safe, but it
> requires complexity and computation, and I think there is significant risk
> that implementors may knowing or unknowingly sacrifice security. It is
> actually more safe to ignore delimiters and allow for a match of a prefix
> so long as it is greater than a certain length. At least in that case there
> is guarantee of the number of times that it must be brute forced...
>
> Full-atom matching is simply much more difficult to get wrong from a
> security perspective, and it results in pretty decent compression...
>
> -=R
>
>
>
> On Thu, Mar 28, 2013 at 1:00 PM, RUELLAN Herve <Herve.Ruellan@crf.canon.fr
> > wrote:
>
>> > -----Original Message-----
>> > From: Roberto Peon [mailto:grmocg@gmail.com]
>> > Sent: jeudi 28 mars 2013 01:15
>> > To: RUELLAN Herve
>> > Cc: agl@google.com; Mark Nottingham; ietf-http-wg@w3.org Group
>> > Subject: Re: Choosing a header compression algorithm
>> >
>> > I've checked in some changes to delta2 which expands and documents
>> > various options for delta2 in the README.md.
>> >
>> > After running a number of variations of delta2, The following defaults
>> look
>> > good for small buffer sizes:
>> >
>> >
>> > delta2=max_entries=256, small_index=1
>> >
>> > small_index basically says use a uint8 instead of a uint16 for
>> representing
>> > indices, and is the kind of thing that could be messaged somewhere
>> (opcode,
>> > flag, whatever).
>> >
>> > The best headerdiff option which I believe is safe against CRIME in the
>> future
>> > is:
>> >   headerdiff=delta_type=false,Huffman
>>
>> I think that for headerdiff, the best option which is safe against CRIME
>> is:
>> headerdiff=delta_type='/&= \coma',Huffman
>>
>> >
>> > I removed prefix matching from delta some months ago (~6 I think?) after
>> > cogitating on it for a while and then speaking with security folks.. I
>> just
>> > couldn't come up with a way I could prove was safe, unlike the atom-
>> > matching, which one can prove is no worse than a brute-force attack.
>>
>> The limited prefix matching defined above also need a brute-force attack
>> to be broken.
>>
>> Hervé.
>>
>> >
>> > I've appended runs with these values@4k buffer size for delta2 and
>> > headerdiff below.
>> > -=R
>> >
>> >
>> >
>> >
>> > * TOTAL: 5949 req messages
>> >
>>                                               size  time | ratio
>> > min   max   std
>> >
>>                                http1     3,460,925  0.13 |
>> > 1.00  1.00  1.00  0.00
>> >   delta2 (max_byte_size=4096, max_entries=256, small_index=1,
>> > hg_adjust=0, implicit_hg_add=0, refcnt_vals=0)       664,683  4.16 |
>> 0.19  0.02
>> > 0.83  0.15
>> >                                                          headerdiff
>> (buffer=4096, delta_type=false,
>> > huffman)       759,783  2.03 | 0.22  0.01  0.78  0.18
>> >
>> >
>> > * TOTAL: 5948 res messages
>> >
>>                                               size  time | ratio
>> > min   max   std
>> >
>>                                http1     2,186,162  0.12 |
>> > 1.00  1.00  1.00  0.00
>> >   delta2 (max_byte_size=4096, max_entries=256, small_index=1,
>> > hg_adjust=0, implicit_hg_add=0, refcnt_vals=0)       585,475  5.32 |
>> 0.27  0.02
>> > 1.28  0.13
>> >                                                          headerdiff
>> (buffer=4096, delta_type=false,
>> > huffman)       543,047  3.29 | 0.25  0.02  0.73  0.14
>> >
>> >
>> >
>> >
>>
>>
>
Received on Thursday, 28 March 2013 19:19:44 UTC