Re: Headers and Compression [was: Re: Getting (Officially) Started on HTTP/2.0] from Roberto Peon on 2012-10-03 (ietf-http-wg@w3.org from October to December 2012)

From: Roberto Peon <grmocg@gmail.com>
Date: Wed, 3 Oct 2012 14:42:51 -0700
To: James M Snell <jasnell@gmail.com>
Cc: Amos Jeffries <squid3@treenet.co.nz>, ietf-http-wg@w3.org
Message-ID: <CAP+FsNcLv2hxMpDtA4DE=rpsH6rLg+7XsD3QEd9tyrbtHfo_Sw@mail.gmail.com>
CRIME works by observing the size of the resultant packet stream.
Thus, if the plaintext is ever compressed within the same stream context as
user-controlled plaintext, then the can learn something about what is going
on, regardless of output salting, encryption, etc.

With the compression that I'm proposing, you only learn something when
you've guessed the entire plaintext for that field, verbatim, at which
point you're just as well off by sending the data to the server directly.
I'll be writing it up shortly.

-=R

On Wed, Oct 3, 2012 at 1:19 PM, James M Snell <jasnell@gmail.com> wrote:

>
>
> On Wed, Oct 3, 2012 at 12:15 AM, Roberto Peon <grmocg@gmail.com> wrote:
>
>>
>> [snip]
>> Yep-- what I've been doing is whole-key or whole-value delta-encoding
>> with static huffman coding, with an LRU of key-value pairs. A set of
>> headers is thus simply a set of references to the items in the LRU.
>> The set of operations is:
>>   add a new hey-value line into the LRU by specifying a new key-value
>>       this looks like:  {opcode: KVStore, string key, string val}.
>>   add a new key-value line into the LRU by referencing a previous
>> key-value, copying the key from it and adding the specified new value
>>       this looks like:  {opcode: Mutate,int lru_index, string val}.
>>   toggle visibility for a particular LRU entry for a particular header set
>>       this looks like:  {opcode: Toggle,int lru_index}.
>>   toggle visibility for a contiguous range of LRU entries for a
>> particular header set
>>       this looks like:  {opcode: Toggle,int lru_index_start, int
>> lru_index_end}.
>>
>> Note that the actual format of the operations isn't exactly like what I'm
>> describing above- I'm just trying to indicate generally what is involved.
>>
>>
> It would definitely be helpful to have descriptive write up on this,
> perhaps submitted as an I-D, that we can review.
>
> Putting aside, for a moment, the contentious and controversial history of
> discussions around websocket... could we not address the CRIME issue by
> randomly salting and masking individual frames within the stream? Yes,
> there is an obvious negative impact to deflate encoding, but if we utilize
> tokenization (ala my bohe draft) then we would achieve a significant level
> of compression naturally through the encoding. I have not yet fully tested
> it, but the combination of that, the randomized salting, and the tls
> encryption should be not be subject to CRIME type attacks. Just a thought.
>
> - James
>
>
>> The resulting compression is a bit worse than gzip (with large window
>> size) on my current test corpus, but compares pretty well with gzip in the
>> Chrome implementation of SPDY.
>> It has CPU advantages in that the huffman encoding is static, thus for
>> proxies there is no re-encoding necessary. Additionally, much or all of the
>> decompressor state can be shared with a compressor (if proxying, for
>> instance).
>> Finally, I expect (though I've yet to prove it yet, as I'm still doing
>> the c++ implementation) that the compression is more CPU efficient than
>> gzip. Decompression should be similar... but.. much of the time you need
>> not reconstitute an entire set of headers-- instead, since we're sending
>> deltas anyway, you simply ammend your state based on what changed and thus
>> become more efficient there as well.
>>
>> If clients/servers were a bit more naive in terms of when they
>> added/removed headers, the delta-coding would be more efficient and it'd
>> approach or exceed gzip compression.. at least I think so :)
>> As far as I (or thusfar anyone with whom I've spoken) can tell, the
>> approach here does not allow probing of the compression context, and is
>> thus robust in the face of known attacks.
>>
>> Anyway, that is what I've been working on.
>>  -=R
>>
>>
>>
>>>
>>>
>>>
>>>  Following that, I suspect it'll be most useful to work on the upgrade
>>>> mechanism (which will also help with #1 above). Patrick sent out what
>>>> I think most people agree is a good starting point for that discussion
>>>> here: <http://www.w3.org/mid/**1345470312.2877.55.camel@ds9<http://www.w3.org/mid/1345470312.2877.55.camel@ds9>
>>>> >.
>>>>
>>>> We'll start these discussions soon, using the Atlanta meeting as a
>>>> checkpoint for the work. If its' going well by then (i.e., we have a
>>>> good set of issues and some healthy discussion, ideally with some data
>>>> starting to emerge), I'd expect us to schedule an interim meeting
>>>> sometime early next year, to have more substantial discussion.
>>>>
>>>> More details to follow. Thanks to everybody for helping get us this
>>>> far, as well as to Martin, Alexey and Julian for volunteering their
>>>> time.
>>>>
>>>> Regards,
>>>>
>>>> --
>>>> Mark Nottingham
>>>> http://www.mnot.net/
>>>>
>>>
>>>
>>> AYJ
>>>
>>>
>>>
>>
>
Received on Wednesday, 3 October 2012 21:43:24 UTC