Re: First cut of Huffman encoding in compression document. from Roberto Peon on 2013-10-17 (ietf-http-wg@w3.org from October to December 2013)

From: Roberto Peon <grmocg@gmail.com>
Date: Thu, 17 Oct 2013 13:44:11 -0700
To: Ilari Liusvaara <ilari.liusvaara@elisanet.fi>
Cc: HTTP Working Group <ietf-http-wg@w3.org>
Message-ID: <CAP+FsNcOWr0swPBCGibXOEnVogm5tL-euqPrd1OSDJtyPm-OtA@mail.gmail.com>

Right now we're using NULL for that, since we have to be able to deal with
this in both the huffman-encoded and non-huffman-encoded case.

EOF/EOS is useful ONLY for delimiting the end of the name data or
value-list data.
-=R


On Thu, Oct 17, 2013 at 1:20 PM, Ilari Liusvaara <
ilari.liusvaara@elisanet.fi> wrote:

> On Thu, Oct 17, 2013 at 11:11:48AM -0700, Roberto Peon wrote:
> > On Thu, Oct 17, 2013 at 7:17 AM, Ilari Liusvaara <
> > ilari.liusvaara@elisanet.fi> wrote:
> >
> > > On Wed, Oct 16, 2013 at 12:20:08PM -0700, Roberto Peon wrote:
> > > > The first cut of Humman encoding is now included in the compression
> > > > document.
> > > >
> > > > Please take a look and shout about the parts that you find
> > > confusing/could
> > > > be better expressed.
> > >
> > > - Lengths are marked as 8+. Are those 8 bit prefix or 0 bit prefix
> > >   (IIRC, those were 0 bit in some past versions)?
> > >
> >
> > Do you mean the lengths in the huffman table?
>
> No, the length fields that nominally contain the length of huffman-
> encoded value (or name).
>
> > > - Is encountering a name/value without End-Of-String an error?
> >
> > - Is encountering a name/value with more bytes after EOS an error?
> > >   * Appears to be string delimiter for values?
> >
> > We have a few options here.
> > 2) Represent length of huffman-encoded strings as bytes and
> >    a) include an EOF (or EOS :) ) terminal to indicate when the last
> valid
> > bit was read
> >    b) pad the last byte with bits from one of the 8+ bit symbols
> >    c) assume that the last character successfully decoded within the
> bytes
> > is the last one
> >        intended, regardless of choice of a/b
> >
> > 2b suffers if we don't have characters like this
>
> You are guaranteed to have one, unless there are at most 128 source
> symbols.
>
> And since this is to be 8-bit clean, there are going to be at least
> 256(>128).
>
> Even just UTF-8 first bytes would need 182(>128).
>
> > 2c is probably a good idea regardless, as it means that the EOF/EOS is
> > never going
> > to require adding additional bytes to the encoded data.
> >
> > How about 2b+2c then (I'll need to regenerate the symbol tables, but that
> > shouldn't affect anyone)?
>
> Yeah, sounds good.
>
> Of course, this leaves the case where there is EOF/EOS in name field
> (EOF/EOS in value field encodes multiple values, right?)
>
> -Ilari
>

Received on Thursday, 17 October 2013 20:44:40 UTC