Re: First cut of Huffman encoding in compression document.

An alternative is to mandate the padding character, e.g. say that it must
be the least code lexicographically that has >= 8 (not >= 7) bits. Or just
arbitrarily pick one and christen it the EOS bit, too.

Not sure if it's worth saving an 8+ code, though.


On Thu, Oct 17, 2013 at 5:29 PM, Roberto Peon <grmocg@gmail.com> wrote:

> I know, I could instead state that the padding must be any code >= 7 bits,
> but then implementations have to choose which character is to be used for
> this, and that might vary based on the frequency table. Bleh.
>
> By using EOS, we guarantee that when the table changes, then the symbol
> one encodes as padding is always the same symbol.
>
> Do you want to reopen the issue so we discuss it at Vancouver, or do you
> find that persuasive?
>
> -=R
>
>
> On Thu, Oct 17, 2013 at 5:14 PM, Ilari Liusvaara <
> ilari.liusvaara@elisanet.fi> wrote:
>
>> On Thu, Oct 17, 2013 at 02:32:19PM -0700, Roberto Peon wrote:
>> > On Thu, Oct 17, 2013 at 2:15 PM, Fred Akalin <akalin@google.com> wrote:
>> > >
>> > > - I may be missing something, but I'm not sure why we need a string
>> > > literal to be both length-delimited and have an end marker. I'd
>> prefer just
>> > > having the length and assigning the short encoding for EOS to
>> something
>> > > else.
>> > >
>> >
>> > Since length is in bytes instead of bits, but huffman encoded things are
>> > bit based, you need either to represent length in bits (which makes the
>> > lengths bigger), or you need to ensure that the padding used to get to
>> the
>> > next byte boundary at the end of the string cannot be interpreted as a
>> > valid huffman-code.
>>
>> You don't need EOS code for this.
>>
>> This is because there are more than 128 source symbols and target alphabet
>> is binary, and thus there exists 8+ bit code. Prefix of that can act as
>> padding, it can't be mixed up with valid symbol.
>>
>> -Ilari
>>
>
>

Received on Friday, 18 October 2013 01:57:23 UTC