- From: Roberto Peon <grmocg@gmail.com>
- Date: Fri, 16 Aug 2013 09:29:57 -0700
- To: Martin Thomson <martin.thomson@gmail.com>
- Cc: Martin J. Dürst <duerst@it.aoyama.ac.jp>, Fred Akalin <akalin@google.com>, James Snell <jasnell@gmail.com>, "ietf-http-wg@w3.org" <ietf-http-wg@w3.org>
- Message-ID: <CAP+FsNeqi7fOxawNmFU8Psf7WhXaH-Dbwjz-XefX3g66Yeca-w@mail.gmail.com>
I view it as liberating-- as the compressor is now freed from worrying about normalization, etc. which, if done, should be done at a higher layer. There is currently exactly one field that the compressor makes assumptions about and we could change that by requiring that the HTTP-layer do the transformation of cookie into cookie-crumbs instead of having the compressor do it. The compressor knows zero about anything else, semantically right now. The huffman encoder that we had and will likely add back worked on bytes. It mostly encountered ASCII, and thus the frequency table was skewed to compress ASCII better than other things, but it could still handle UTF-8, raw binary, whatever. I could certainly see an eventual future where some values are just raw binary. Sure, the huffman-based encoder would not compress that very well, but that is OK-- the binary rep should already be fairly small in comparison to the B64 encoding we do today (I'd rather have the data remain the same size than getting a 30% decrease after a 4X expansion, which is what would happen today...), and an escape valve of not having to use the huffman encoding has always been the plan. We could still allow for compressors to do things with semantic knowledge, but there is no need to *require* it by declaring the type of all values a prior. Simply require that any transformation the compressor does must not change the semantic meaning of the value. Problem solved, I think. -=R -=R On Fri, Aug 16, 2013 at 9:19 AM, Martin Thomson <martin.thomson@gmail.com>wrote: > On 16 August 2013 08:44, Roberto Peon <grmocg@gmail.com> wrote: > > The keys should be ASCII, and the values bytes. > > That's a fairly narrow view. If the values were (for example) ASCII, > then you'd have an opportunity to compress better. At worst, you can > wipe the high order bit from every octet. > > At some level you are going to need to either make assumptions about > the properties of values, or rely on specific knowledge about them if > you are going to compress effectively. Even if it were the case that > the bytes were UTF-8, you could still make some gains over pure bytes > (even just by exploiting the fact that certain byte sequences are not > possible in UTF-8). >
Received on Friday, 16 August 2013 16:30:25 UTC