Re: First cut of Huffman encoding in compression document.

In other words, we can construct the code table such that padding with
zeroes is always acceptable, but it requires care in the construction of
the code table.
-=R


On Thu, Oct 17, 2013 at 5:44 PM, Roberto Peon <grmocg@gmail.com> wrote:

> That is probably what would happen.
> There is an issue (in github already) about inverting the prefix sense
> (i.e. 1->0 and vice versa) so that padding is zeros, however the part one
> must be very careful about is that this is not guaranteed to happen for any
> entry.
> -=R
>
>
> On Thu, Oct 17, 2013 at 5:38 PM, Fred Akalin <akalin@google.com> wrote:
>
>> Random proposal, not sure if it's feasible.
>>
>> Would it be possible to explicitly map EOS to '0000000', and therefore we
>> can just say to pad with 0s until the next byte boundary?
>>
>>
>> On Thu, Oct 17, 2013 at 5:29 PM, Roberto Peon <grmocg@gmail.com> wrote:
>>
>>> I know, I could instead state that the padding must be any code >= 7
>>> bits, but then implementations have to choose which character is to be used
>>> for this, and that might vary based on the frequency table. Bleh.
>>>
>>> By using EOS, we guarantee that when the table changes, then the symbol
>>> one encodes as padding is always the same symbol.
>>>
>>> Do you want to reopen the issue so we discuss it at Vancouver, or do you
>>> find that persuasive?
>>>
>>> -=R
>>>
>>>
>>> On Thu, Oct 17, 2013 at 5:14 PM, Ilari Liusvaara <
>>> ilari.liusvaara@elisanet.fi> wrote:
>>>
>>>> On Thu, Oct 17, 2013 at 02:32:19PM -0700, Roberto Peon wrote:
>>>> > On Thu, Oct 17, 2013 at 2:15 PM, Fred Akalin <akalin@google.com>
>>>> wrote:
>>>> > >
>>>> > > - I may be missing something, but I'm not sure why we need a string
>>>> > > literal to be both length-delimited and have an end marker. I'd
>>>> prefer just
>>>> > > having the length and assigning the short encoding for EOS to
>>>> something
>>>> > > else.
>>>> > >
>>>> >
>>>> > Since length is in bytes instead of bits, but huffman encoded things
>>>> are
>>>> > bit based, you need either to represent length in bits (which makes
>>>> the
>>>> > lengths bigger), or you need to ensure that the padding used to get
>>>> to the
>>>> > next byte boundary at the end of the string cannot be interpreted as a
>>>> > valid huffman-code.
>>>>
>>>> You don't need EOS code for this.
>>>>
>>>> This is because there are more than 128 source symbols and target
>>>> alphabet
>>>> is binary, and thus there exists 8+ bit code. Prefix of that can act as
>>>> padding, it can't be mixed up with valid symbol.
>>>>
>>>> -Ilari
>>>>
>>>
>>>
>>
>

Received on Friday, 18 October 2013 00:45:46 UTC