Re: First cut of Huffman encoding in compression document.

On Thu, Oct 17, 2013 at 02:32:19PM -0700, Roberto Peon wrote:
> On Thu, Oct 17, 2013 at 2:15 PM, Fred Akalin <akalin@google.com> wrote:
> >
> > - I may be missing something, but I'm not sure why we need a string
> > literal to be both length-delimited and have an end marker. I'd prefer just
> > having the length and assigning the short encoding for EOS to something
> > else.
> >
> 
> Since length is in bytes instead of bits, but huffman encoded things are
> bit based, you need either to represent length in bits (which makes the
> lengths bigger), or you need to ensure that the padding used to get to the
> next byte boundary at the end of the string cannot be interpreted as a
> valid huffman-code.

You don't need EOS code for this.

This is because there are more than 128 source symbols and target alphabet
is binary, and thus there exists 8+ bit code. Prefix of that can act as
padding, it can't be mixed up with valid symbol.

-Ilari

Received on Friday, 18 October 2013 00:14:34 UTC