Re: First cut of Huffman encoding in compression document.

On Thu, Oct 17, 2013 at 7:17 AM, Ilari Liusvaara <
ilari.liusvaara@elisanet.fi> wrote:

> On Wed, Oct 16, 2013 at 12:20:08PM -0700, Roberto Peon wrote:
> > The first cut of Humman encoding is now included in the compression
> > document.
> >
> > Please take a look and shout about the parts that you find
> confusing/could
> > be better expressed.
>
> - Lengths are marked as 8+. Are those 8 bit prefix or 0 bit prefix
>   (IIRC, those were 0 bit in some past versions)?
>

Do you mean the lengths in the huffman table?


> - Is encountering a name/value without End-Of-String an error?

- Is encountering a name/value with more bytes after EOS an error?
>   * Appears to be string delimiter for values?
>


We have a few options here.
1) Represent length of huffman-encoded strings as *bits* in the length field
   I've done this, it works, but it bloats the encoding.
2) Represent length of huffman-encoded strings as bytes and
   a) include an EOF (or EOS :) ) terminal to indicate when the last valid
bit was read
   b) pad the last byte with bits from one of the 8+ bit symbols
   c) assume that the last character successfully decoded within the bytes
is the last one
       intended, regardless of choice of a/b

2b suffers if we don't have characters like this, though we could
manufacture one
(perhaps EOF/EOS?). Hmm.. this could work better than what we have now as it
would free up some a lower-bit-length code for an element of application
data.

2c is probably a good idea regardless, as it means that the EOF/EOS is
never going
to require adding additional bytes to the encoded data.

How about 2b+2c then (I'll need to regenerate the symbol tables, but that
shouldn't affect anyone)?

-=R


>
> -Ilari
>

Received on Thursday, 17 October 2013 18:12:15 UTC