W3C home > Mailing lists > Public > ietf-http-wg@w3.org > January to March 2013

Re: delta encoding and state management

From: James M Snell <jasnell@gmail.com>
Date: Thu, 17 Jan 2013 13:44:42 -0800
Message-ID: <CABP7RbfDZcRH-0_AaN9iYjPN-v6QjU6_Xdy5o1BHYnDFWHtuAg@mail.gmail.com>
To: Roberto Peon <grmocg@gmail.com>
Cc: Nico Williams <nico@cryptonector.com>, "ietf-http-wg@w3.org" <ietf-http-wg@w3.org>
On Thu, Jan 17, 2013 at 1:35 PM, Roberto Peon <grmocg@gmail.com> wrote:

> Here are where my doubts stem from:
> "Custom" keys and values (cookies, user-agent strings, etc) are added to
> headers and repeated basically all the time. If we don't know the input a
> priori, we cannot construct a perfect or minimal encoding.
> Lets assume we do know about all of the possible header keys and values at
> the time we create the spec.
> Even then, we achieve something better than stateful compression only when
> the sum-total of allowed permutations is less than the amount of bits you
> need to toggle the visibility on a piece of state (which is very small
> indeed).

We certainly cannot come up with optimized binary encodings for everything
but we can get a good ways down the road optimizing the parts we do know
about. We've already seen, for instance, that date headers can be optimized
significantly; and the separation of individual cookie crumbs allows us to
keep from having to resend the entire cookie whenever just one small part
changes. I'm certain there are other optimizations we can make without
feeling like we have to find encodings for everything.

- James

> -=R
> On Thu, Jan 17, 2013 at 1:19 PM, Nico Williams <nico@cryptonector.com>wrote:
>> On Thu, Jan 17, 2013 at 2:52 PM, Roberto Peon <grmocg@gmail.com> wrote:
>> > Can you support that with data? The data I've seen thusfar strongly
>> > disproves that assertion...
>> No need: it follows from basic information theory that a generic
>> compression algorithm cannot do better than a minimal encoding of the
>> same data.  The key is that the encoding has to be truly minimal,
>> otherwise all bets are off.
>> Of course, the encoding may not really be always minimal.  A varint
>> encoding of seconds since epoch is not minimal if the epoch is far in
>> the past, but resetting the epoch has its costs, and gzip and friends
>> won't know anything about that, -- in time a datetime encoding like
>> seconds-since-epoch will become less than minimal.
>> We probably cannot get truly minimal encodings of everything we care
>> about, but as we saw with the datetime encoding/compression
>> sub-thread, in some cases we can do much better than generic encoding.
>>  The thing about generic compression is that it has to look for
>> repeating patterns, but some patterns are hard to detect, and
>> dictionaries add overhead, which is why we stand a very good chance of
>> doing at least a satisfactory job with minimal encodings.  And if we
>> can avoid long-term per-connection compression state then that's a big
>> architectural win.  It doesn't have to be the case that we compress
>> better this way than the other, just well enough.
>> Nico
>> --
Received on Thursday, 17 January 2013 21:45:31 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 1 March 2016 11:11:09 UTC