Re: First cut of Huffman encoding in compression document.

On Thu, Oct 17, 2013 at 11:11:48AM -0700, Roberto Peon wrote:
> On Thu, Oct 17, 2013 at 7:17 AM, Ilari Liusvaara <
> ilari.liusvaara@elisanet.fi> wrote:
> 
> > On Wed, Oct 16, 2013 at 12:20:08PM -0700, Roberto Peon wrote:
> > > The first cut of Humman encoding is now included in the compression
> > > document.
> > >
> > > Please take a look and shout about the parts that you find
> > confusing/could
> > > be better expressed.
> >
> > - Lengths are marked as 8+. Are those 8 bit prefix or 0 bit prefix
> >   (IIRC, those were 0 bit in some past versions)?
> >
> 
> Do you mean the lengths in the huffman table?

No, the length fields that nominally contain the length of huffman-
encoded value (or name).

> > - Is encountering a name/value without End-Of-String an error?
> 
> - Is encountering a name/value with more bytes after EOS an error?
> >   * Appears to be string delimiter for values?
> 
> We have a few options here.
> 2) Represent length of huffman-encoded strings as bytes and
>    a) include an EOF (or EOS :) ) terminal to indicate when the last valid
> bit was read
>    b) pad the last byte with bits from one of the 8+ bit symbols
>    c) assume that the last character successfully decoded within the bytes
> is the last one
>        intended, regardless of choice of a/b
> 
> 2b suffers if we don't have characters like this

You are guaranteed to have one, unless there are at most 128 source symbols.

And since this is to be 8-bit clean, there are going to be at least
256(>128).

Even just UTF-8 first bytes would need 182(>128).

> 2c is probably a good idea regardless, as it means that the EOF/EOS is
> never going
> to require adding additional bytes to the encoded data.
> 
> How about 2b+2c then (I'll need to regenerate the symbol tables, but that
> shouldn't affect anyone)?

Yeah, sounds good.

Of course, this leaves the case where there is EOF/EOS in name field
(EOF/EOS in value field encodes multiple values, right?)

-Ilari

Received on Thursday, 17 October 2013 20:20:47 UTC