- From: Nico Williams <nico@cryptonector.com>
- Date: Thu, 17 Jan 2013 15:19:08 -0600
- To: Roberto Peon <grmocg@gmail.com>
- Cc: James M Snell <jasnell@gmail.com>, "ietf-http-wg@w3.org" <ietf-http-wg@w3.org>
On Thu, Jan 17, 2013 at 2:52 PM, Roberto Peon <grmocg@gmail.com> wrote: > Can you support that with data? The data I've seen thusfar strongly > disproves that assertion... No need: it follows from basic information theory that a generic compression algorithm cannot do better than a minimal encoding of the same data. The key is that the encoding has to be truly minimal, otherwise all bets are off. Of course, the encoding may not really be always minimal. A varint encoding of seconds since epoch is not minimal if the epoch is far in the past, but resetting the epoch has its costs, and gzip and friends won't know anything about that, -- in time a datetime encoding like seconds-since-epoch will become less than minimal. We probably cannot get truly minimal encodings of everything we care about, but as we saw with the datetime encoding/compression sub-thread, in some cases we can do much better than generic encoding. The thing about generic compression is that it has to look for repeating patterns, but some patterns are hard to detect, and dictionaries add overhead, which is why we stand a very good chance of doing at least a satisfactory job with minimal encodings. And if we can avoid long-term per-connection compression state then that's a big architectural win. It doesn't have to be the case that we compress better this way than the other, just well enough. Nico --
Received on Thursday, 17 January 2013 21:19:32 UTC