Re: First cut of Huffman encoding in compression document.

Random proposal, not sure if it's feasible.

Would it be possible to explicitly map EOS to '0000000', and therefore we
can just say to pad with 0s until the next byte boundary?


On Thu, Oct 17, 2013 at 5:29 PM, Roberto Peon <grmocg@gmail.com> wrote:

> I know, I could instead state that the padding must be any code >= 7 bits,
> but then implementations have to choose which character is to be used for
> this, and that might vary based on the frequency table. Bleh.
>
> By using EOS, we guarantee that when the table changes, then the symbol
> one encodes as padding is always the same symbol.
>
> Do you want to reopen the issue so we discuss it at Vancouver, or do you
> find that persuasive?
>
> -=R
>
>
> On Thu, Oct 17, 2013 at 5:14 PM, Ilari Liusvaara <
> ilari.liusvaara@elisanet.fi> wrote:
>
>> On Thu, Oct 17, 2013 at 02:32:19PM -0700, Roberto Peon wrote:
>> > On Thu, Oct 17, 2013 at 2:15 PM, Fred Akalin <akalin@google.com> wrote:
>> > >
>> > > - I may be missing something, but I'm not sure why we need a string
>> > > literal to be both length-delimited and have an end marker. I'd
>> prefer just
>> > > having the length and assigning the short encoding for EOS to
>> something
>> > > else.
>> > >
>> >
>> > Since length is in bytes instead of bits, but huffman encoded things are
>> > bit based, you need either to represent length in bits (which makes the
>> > lengths bigger), or you need to ensure that the padding used to get to
>> the
>> > next byte boundary at the end of the string cannot be interpreted as a
>> > valid huffman-code.
>>
>> You don't need EOS code for this.
>>
>> This is because there are more than 128 source symbols and target alphabet
>> is binary, and thus there exists 8+ bit code. Prefix of that can act as
>> padding, it can't be mixed up with valid symbol.
>>
>> -Ilari
>>
>
>

Received on Friday, 18 October 2013 00:39:11 UTC