Re: delta encoding and state management

On 18/01/2013, at 8:19 AM, Nico Williams <nico@cryptonector.com> wrote:

> On Thu, Jan 17, 2013 at 2:52 PM, Roberto Peon <grmocg@gmail.com> wrote:
>> Can you support that with data? The data I've seen thusfar strongly
>> disproves that assertion...
> 
> No need: it follows from basic information theory that a generic
> compression algorithm cannot do better than a minimal encoding of the
> same data.  The key is that the encoding has to be truly minimal,
> otherwise all bets are off.
> 
> Of course, the encoding may not really be always minimal.  A varint
> encoding of seconds since epoch is not minimal if the epoch is far in
> the past, but resetting the epoch has its costs, and gzip and friends
> won't know anything about that, -- in time a datetime encoding like
> seconds-since-epoch will become less than minimal.
> 
> We probably cannot get truly minimal encodings of everything we care
> about, but as we saw with the datetime encoding/compression
> sub-thread, in some cases we can do much better than generic encoding.
> The thing about generic compression is that it has to look for
> repeating patterns, but some patterns are hard to detect, and
> dictionaries add overhead, which is why we stand a very good chance of
> doing at least a satisfactory job with minimal encodings.  And if we
> can avoid long-term per-connection compression state then that's a big
> architectural win.  It doesn't have to be the case that we compress
> better this way than the other, just well enough.


The problem is that the biggest part of our target data is often opaque blobs -- in the forms of cookies, ETags, request URIs and referers. A "minimal encoding" of a cookie, as currently defined, isn't going to save us much at all.

Cheers,


--
Mark Nottingham   http://www.mnot.net/

Received on Thursday, 17 January 2013 21:33:49 UTC