- From: James M Snell <jasnell@gmail.com>
- Date: Fri, 16 Aug 2013 09:49:15 -0700
- To: Roberto Peon <grmocg@gmail.com>
- Cc: Martin Thomson <martin.thomson@gmail.com>, Martin J. Dürst <duerst@it.aoyama.ac.jp>, Fred Akalin <akalin@google.com>, "ietf-http-wg@w3.org" <ietf-http-wg@w3.org>
On Fri, Aug 16, 2013 at 9:29 AM, Roberto Peon <grmocg@gmail.com> wrote: > I view it as liberating-- as the compressor is now freed from worrying about > normalization, etc. which, if done, should be done at a higher layer. > FWIW, I don't believe anyone had said anything about normalization... valid UTF-8 octets, yes, but not normalization. The compression mechanism is really not affected by whether or not we say UTF-8 here... - James > There is currently exactly one field that the compressor makes assumptions > about and we could change that by requiring that the HTTP-layer do the > transformation of cookie into cookie-crumbs instead of having the compressor > do it. The compressor knows zero about anything else, semantically right > now. > > The huffman encoder that we had and will likely add back worked on bytes. It > mostly encountered ASCII, and thus the frequency table was skewed to > compress ASCII better than other things, but it could still handle UTF-8, > raw binary, whatever. > > I could certainly see an eventual future where some values are just raw > binary. > Sure, the huffman-based encoder would not compress that very well, but that > is OK-- the binary rep should already be fairly small in comparison to the > B64 encoding we do today (I'd rather have the data remain the same size than > getting a 30% decrease after a 4X expansion, which is what would happen > today...), and an escape valve of not having to use the huffman encoding has > always been the plan. > > We could still allow for compressors to do things with semantic knowledge, > but there is no need to *require* it by declaring the type of all values a > prior. > Simply require that any transformation the compressor does must not change > the semantic meaning of the value. Problem solved, I think. > > -=R > > -=R > > > On Fri, Aug 16, 2013 at 9:19 AM, Martin Thomson <martin.thomson@gmail.com> > wrote: >> >> On 16 August 2013 08:44, Roberto Peon <grmocg@gmail.com> wrote: >> > The keys should be ASCII, and the values bytes. >> >> That's a fairly narrow view. If the values were (for example) ASCII, >> then you'd have an opportunity to compress better. At worst, you can >> wipe the high order bit from every octet. >> >> At some level you are going to need to either make assumptions about >> the properties of values, or rely on specific knowledge about them if >> you are going to compress effectively. Even if it were the case that >> the bytes were UTF-8, you could still make some gains over pure bytes >> (even just by exploiting the fact that certain byte sequences are not >> possible in UTF-8). > >
Received on Friday, 16 August 2013 16:50:03 UTC