Re: Headers and Compression [was: Re: Getting (Officially) Started on HTTP/2.0] from James M Snell on 2012-10-03 (ietf-http-wg@w3.org from October to December 2012)

From: James M Snell <jasnell@gmail.com>
Date: Wed, 3 Oct 2012 14:49:19 -0700
To: Roberto Peon <grmocg@gmail.com>
Cc: Amos Jeffries <squid3@treenet.co.nz>, ietf-http-wg@w3.org
Message-ID: <CABP7RbeGmyYy7rvVhjOymrcUFhns5Nf4D9ahyP0-NqFcZiEDew@mail.gmail.com>
On Wed, Oct 3, 2012 at 2:42 PM, Roberto Peon <grmocg@gmail.com> wrote:

> CRIME works by observing the size of the resultant packet stream.
> Thus, if the plaintext is ever compressed within the same stream context
> as user-controlled plaintext, then the can learn something about what is
> going on, regardless of output salting, encryption, etc.
>
>
Ok, got it..


> With the compression that I'm proposing, you only learn something when
> you've guessed the entire plaintext for that field, verbatim, at which
> point you're just as well off by sending the data to the server directly.
> I'll be writing it up shortly.
>
>
Will definitely be looking forward to seeing that. I'd like to explore
whether the new mechanism is going to be efficiently compatible with
bohe-like tokenization to see if it still makes sense to head down that
path.

- James


> -=R
>
> On Wed, Oct 3, 2012 at 1:19 PM, James M Snell <jasnell@gmail.com> wrote:
>
>>
>>
>> On Wed, Oct 3, 2012 at 12:15 AM, Roberto Peon <grmocg@gmail.com> wrote:
>>
>>>
>>> [snip]
>>> Yep-- what I've been doing is whole-key or whole-value delta-encoding
>>> with static huffman coding, with an LRU of key-value pairs. A set of
>>> headers is thus simply a set of references to the items in the LRU.
>>> The set of operations is:
>>>   add a new hey-value line into the LRU by specifying a new key-value
>>>       this looks like:  {opcode: KVStore, string key, string val}.
>>>   add a new key-value line into the LRU by referencing a previous
>>> key-value, copying the key from it and adding the specified new value
>>>       this looks like:  {opcode: Mutate,int lru_index, string val}.
>>>   toggle visibility for a particular LRU entry for a particular header
>>> set
>>>       this looks like:  {opcode: Toggle,int lru_index}.
>>>   toggle visibility for a contiguous range of LRU entries for a
>>> particular header set
>>>       this looks like:  {opcode: Toggle,int lru_index_start, int
>>> lru_index_end}.
>>>
>>> Note that the actual format of the operations isn't exactly like what
>>> I'm describing above- I'm just trying to indicate generally what is
>>> involved.
>>>
>>>
>> It would definitely be helpful to have descriptive write up on this,
>> perhaps submitted as an I-D, that we can review.
>>
>> Putting aside, for a moment, the contentious and controversial history of
>> discussions around websocket... could we not address the CRIME issue by
>> randomly salting and masking individual frames within the stream? Yes,
>> there is an obvious negative impact to deflate encoding, but if we utilize
>> tokenization (ala my bohe draft) then we would achieve a significant level
>> of compression naturally through the encoding. I have not yet fully tested
>> it, but the combination of that, the randomized salting, and the tls
>> encryption should be not be subject to CRIME type attacks. Just a thought.
>>
>> - James
>>
>>
>>> The resulting compression is a bit worse than gzip (with large window
>>> size) on my current test corpus, but compares pretty well with gzip in the
>>> Chrome implementation of SPDY.
>>> It has CPU advantages in that the huffman encoding is static, thus for
>>> proxies there is no re-encoding necessary. Additionally, much or all of the
>>> decompressor state can be shared with a compressor (if proxying, for
>>> instance).
>>> Finally, I expect (though I've yet to prove it yet, as I'm still doing
>>> the c++ implementation) that the compression is more CPU efficient than
>>> gzip. Decompression should be similar... but.. much of the time you need
>>> not reconstitute an entire set of headers-- instead, since we're sending
>>> deltas anyway, you simply ammend your state based on what changed and thus
>>> become more efficient there as well.
>>>
>>> If clients/servers were a bit more naive in terms of when they
>>> added/removed headers, the delta-coding would be more efficient and it'd
>>> approach or exceed gzip compression.. at least I think so :)
>>> As far as I (or thusfar anyone with whom I've spoken) can tell, the
>>> approach here does not allow probing of the compression context, and is
>>> thus robust in the face of known attacks.
>>>
>>> Anyway, that is what I've been working on.
>>>  -=R
>>>
>>>
>>>
>>>>
>>>>
>>>>
>>>>  Following that, I suspect it'll be most useful to work on the upgrade
>>>>> mechanism (which will also help with #1 above). Patrick sent out what
>>>>> I think most people agree is a good starting point for that discussion
>>>>> here: <http://www.w3.org/mid/**1345470312.2877.55.camel@ds9<http://www.w3.org/mid/1345470312.2877.55.camel@ds9>
>>>>> >.
>>>>>
>>>>> We'll start these discussions soon, using the Atlanta meeting as a
>>>>> checkpoint for the work. If its' going well by then (i.e., we have a
>>>>> good set of issues and some healthy discussion, ideally with some data
>>>>> starting to emerge), I'd expect us to schedule an interim meeting
>>>>> sometime early next year, to have more substantial discussion.
>>>>>
>>>>> More details to follow. Thanks to everybody for helping get us this
>>>>> far, as well as to Martin, Alexey and Julian for volunteering their
>>>>> time.
>>>>>
>>>>> Regards,
>>>>>
>>>>> --
>>>>> Mark Nottingham
>>>>> http://www.mnot.net/
>>>>>
>>>>
>>>>
>>>> AYJ
>>>>
>>>>
>>>>
>>>
>>
>
Received on Wednesday, 3 October 2012 21:50:08 UTC