Re: delta encoding and state management from Roberto Peon on 2013-01-17 (ietf-http-wg@w3.org from January to March 2013)

From: Roberto Peon <grmocg@gmail.com>
Date: Thu, 17 Jan 2013 13:35:24 -0800
To: Nico Williams <nico@cryptonector.com>
Cc: James M Snell <jasnell@gmail.com>, "ietf-http-wg@w3.org" <ietf-http-wg@w3.org>
Message-ID: <CAP+FsNfswUN-CK6heRGqEnSJatHGo3q2mZZLTrPnjapCZz2sTg@mail.gmail.com>

Here are where my doubts stem from:

"Custom" keys and values (cookies, user-agent strings, etc) are added to
headers and repeated basically all the time. If we don't know the input a
priori, we cannot construct a perfect or minimal encoding.

Lets assume we do know about all of the possible header keys and values at
the time we create the spec.
Even then, we achieve something better than stateful compression only when
the sum-total of allowed permutations is less than the amount of bits you
need to toggle the visibility on a piece of state (which is very small
indeed).
-=R


On Thu, Jan 17, 2013 at 1:19 PM, Nico Williams <nico@cryptonector.com>wrote:

> On Thu, Jan 17, 2013 at 2:52 PM, Roberto Peon <grmocg@gmail.com> wrote:
> > Can you support that with data? The data I've seen thusfar strongly
> > disproves that assertion...
>
> No need: it follows from basic information theory that a generic
> compression algorithm cannot do better than a minimal encoding of the
> same data.  The key is that the encoding has to be truly minimal,
> otherwise all bets are off.
>
> Of course, the encoding may not really be always minimal.  A varint
> encoding of seconds since epoch is not minimal if the epoch is far in
> the past, but resetting the epoch has its costs, and gzip and friends
> won't know anything about that, -- in time a datetime encoding like
> seconds-since-epoch will become less than minimal.
>
> We probably cannot get truly minimal encodings of everything we care
> about, but as we saw with the datetime encoding/compression
> sub-thread, in some cases we can do much better than generic encoding.
>  The thing about generic compression is that it has to look for
> repeating patterns, but some patterns are hard to detect, and
> dictionaries add overhead, which is why we stand a very good chance of
> doing at least a satisfactory job with minimal encodings.  And if we
> can avoid long-term per-connection compression state then that's a big
> architectural win.  It doesn't have to be the case that we compress
> better this way than the other, just well enough.
>
> Nico
> --
>

Received on Thursday, 17 January 2013 21:35:52 UTC