Re: First cut of Huffman encoding in compression document. from Roberto Peon on 2013-10-18 (ietf-http-wg@w3.org from October to December 2013)

From: Roberto Peon <grmocg@gmail.com>
Date: Thu, 17 Oct 2013 17:45:18 -0700
To: Fred Akalin <akalin@google.com>
Cc: Ilari Liusvaara <ilari.liusvaara@elisanet.fi>, HTTP Working Group <ietf-http-wg@w3.org>
Message-ID: <CAP+FsNfinXx6DxZucc=XkD42fscJq4WB_WRrzY9GznLZztQ71A@mail.gmail.com>

In other words, we can construct the code table such that padding with
zeroes is always acceptable, but it requires care in the construction of
the code table.
-=R


On Thu, Oct 17, 2013 at 5:44 PM, Roberto Peon <grmocg@gmail.com> wrote:

> That is probably what would happen.
> There is an issue (in github already) about inverting the prefix sense
> (i.e. 1->0 and vice versa) so that padding is zeros, however the part one
> must be very careful about is that this is not guaranteed to happen for any
> entry.
> -=R
>
>
> On Thu, Oct 17, 2013 at 5:38 PM, Fred Akalin <akalin@google.com> wrote:
>
>> Random proposal, not sure if it's feasible.
>>
>> Would it be possible to explicitly map EOS to '0000000', and therefore we
>> can just say to pad with 0s until the next byte boundary?
>>
>>
>> On Thu, Oct 17, 2013 at 5:29 PM, Roberto Peon <grmocg@gmail.com> wrote:
>>
>>> I know, I could instead state that the padding must be any code >= 7
>>> bits, but then implementations have to choose which character is to be used
>>> for this, and that might vary based on the frequency table. Bleh.
>>>
>>> By using EOS, we guarantee that when the table changes, then the symbol
>>> one encodes as padding is always the same symbol.
>>>
>>> Do you want to reopen the issue so we discuss it at Vancouver, or do you
>>> find that persuasive?
>>>
>>> -=R
>>>
>>>
>>> On Thu, Oct 17, 2013 at 5:14 PM, Ilari Liusvaara <
>>> ilari.liusvaara@elisanet.fi> wrote:
>>>
>>>> On Thu, Oct 17, 2013 at 02:32:19PM -0700, Roberto Peon wrote:
>>>> > On Thu, Oct 17, 2013 at 2:15 PM, Fred Akalin <akalin@google.com>
>>>> wrote:
>>>> > >
>>>> > > - I may be missing something, but I'm not sure why we need a string
>>>> > > literal to be both length-delimited and have an end marker. I'd
>>>> prefer just
>>>> > > having the length and assigning the short encoding for EOS to
>>>> something
>>>> > > else.
>>>> > >
>>>> >
>>>> > Since length is in bytes instead of bits, but huffman encoded things
>>>> are
>>>> > bit based, you need either to represent length in bits (which makes
>>>> the
>>>> > lengths bigger), or you need to ensure that the padding used to get
>>>> to the
>>>> > next byte boundary at the end of the string cannot be interpreted as a
>>>> > valid huffman-code.
>>>>
>>>> You don't need EOS code for this.
>>>>
>>>> This is because there are more than 128 source symbols and target
>>>> alphabet
>>>> is binary, and thus there exists 8+ bit code. Prefix of that can act as
>>>> padding, it can't be mixed up with valid symbol.
>>>>
>>>> -Ilari
>>>>
>>>
>>>
>>
>

Received on Friday, 18 October 2013 00:45:46 UTC