- From: Roberto Peon <grmocg@gmail.com>
- Date: Fri, 16 Aug 2013 09:57:38 -0700
- To: James M Snell <jasnell@gmail.com>
- Cc: Martin Thomson <martin.thomson@gmail.com>, Martin J. Dürst <duerst@it.aoyama.ac.jp>, Fred Akalin <akalin@google.com>, "ietf-http-wg@w3.org" <ietf-http-wg@w3.org>
- Message-ID: <CAP+FsNcZg==LDg19m0BhfdOpK_6tN1fXfjRqHFztHbW1=TbVZg@mail.gmail.com>
In addition to compressing the bytestrings, the compressor will have to validate utf-8. Nearly the same complexity as normalization (which was proposed earlier) to me-- I now get to scan things yet another time, increasing CPU utilization.. for what? Basically nothing in return if the upper-level doesn't care about it. If the upper-level cares about it, then it should be a prereq of feeding something into the compressor. If not, then it shouldn't be. Either way, these concerns belong outside the compressor. -=R On Fri, Aug 16, 2013 at 9:49 AM, James M Snell <jasnell@gmail.com> wrote: > On Fri, Aug 16, 2013 at 9:29 AM, Roberto Peon <grmocg@gmail.com> wrote: > > I view it as liberating-- as the compressor is now freed from worrying > about > > normalization, etc. which, if done, should be done at a higher layer. > > > > FWIW, I don't believe anyone had said anything about normalization... > valid UTF-8 octets, yes, but not normalization. The compression > mechanism is really not affected by whether or not we say UTF-8 > here... > > - James > > > There is currently exactly one field that the compressor makes > assumptions > > about and we could change that by requiring that the HTTP-layer do the > > transformation of cookie into cookie-crumbs instead of having the > compressor > > do it. The compressor knows zero about anything else, semantically right > > now. > > > > The huffman encoder that we had and will likely add back worked on > bytes. It > > mostly encountered ASCII, and thus the frequency table was skewed to > > compress ASCII better than other things, but it could still handle UTF-8, > > raw binary, whatever. > > > > I could certainly see an eventual future where some values are just raw > > binary. > > Sure, the huffman-based encoder would not compress that very well, but > that > > is OK-- the binary rep should already be fairly small in comparison to > the > > B64 encoding we do today (I'd rather have the data remain the same size > than > > getting a 30% decrease after a 4X expansion, which is what would happen > > today...), and an escape valve of not having to use the huffman encoding > has > > always been the plan. > > > > We could still allow for compressors to do things with semantic > knowledge, > > but there is no need to *require* it by declaring the type of all values > a > > prior. > > Simply require that any transformation the compressor does must not > change > > the semantic meaning of the value. Problem solved, I think. > > > > -=R > > > > -=R > > > > > > On Fri, Aug 16, 2013 at 9:19 AM, Martin Thomson < > martin.thomson@gmail.com> > > wrote: > >> > >> On 16 August 2013 08:44, Roberto Peon <grmocg@gmail.com> wrote: > >> > The keys should be ASCII, and the values bytes. > >> > >> That's a fairly narrow view. If the values were (for example) ASCII, > >> then you'd have an opportunity to compress better. At worst, you can > >> wipe the high order bit from every octet. > >> > >> At some level you are going to need to either make assumptions about > >> the properties of values, or rely on specific knowledge about them if > >> you are going to compress effectively. Even if it were the case that > >> the bytes were UTF-8, you could still make some gains over pure bytes > >> (even just by exploiting the fact that certain byte sequences are not > >> possible in UTF-8). > > > > >
Received on Friday, 16 August 2013 16:58:06 UTC