Re: First cut of Huffman encoding in compression document.

I know, I could instead state that the padding must be any code >= 7 bits,
but then implementations have to choose which character is to be used for
this, and that might vary based on the frequency table. Bleh.

By using EOS, we guarantee that when the table changes, then the symbol one
encodes as padding is always the same symbol.

Do you want to reopen the issue so we discuss it at Vancouver, or do you
find that persuasive?

-=R


On Thu, Oct 17, 2013 at 5:14 PM, Ilari Liusvaara <
ilari.liusvaara@elisanet.fi> wrote:

> On Thu, Oct 17, 2013 at 02:32:19PM -0700, Roberto Peon wrote:
> > On Thu, Oct 17, 2013 at 2:15 PM, Fred Akalin <akalin@google.com> wrote:
> > >
> > > - I may be missing something, but I'm not sure why we need a string
> > > literal to be both length-delimited and have an end marker. I'd prefer
> just
> > > having the length and assigning the short encoding for EOS to something
> > > else.
> > >
> >
> > Since length is in bytes instead of bits, but huffman encoded things are
> > bit based, you need either to represent length in bits (which makes the
> > lengths bigger), or you need to ensure that the padding used to get to
> the
> > next byte boundary at the end of the string cannot be interpreted as a
> > valid huffman-code.
>
> You don't need EOS code for this.
>
> This is because there are more than 128 source symbols and target alphabet
> is binary, and thus there exists 8+ bit code. Prefix of that can act as
> padding, it can't be mixed up with valid symbol.
>
> -Ilari
>

Received on Friday, 18 October 2013 00:30:15 UTC