Re: First cut of Huffman encoding in compression document.

Initial comments:

- I may be missing something, but I'm not sure why we need a string literal
to be both length-delimited and have an end marker. I'd prefer just having
the length and assigning the short encoding for EOS to something else.
- Do we gain that much by having separate tables for request and response?
I was looking forward to not having to make a distinction between
request/response contexts since we now have a single static table, but this
separation blocks that again.
- I can see it being useful to encode both the Huffman-encoded length and
the original length of the string (or the delta between them), so that
buffers can be sized just once.


On Thu, Oct 17, 2013 at 1:44 PM, Roberto Peon <grmocg@gmail.com> wrote:

> Right now we're using NULL for that, since we have to be able to deal with
> this in both the huffman-encoded and non-huffman-encoded case.
>
> EOF/EOS is useful ONLY for delimiting the end of the name data or
> value-list data.
> -=R
>
>
> On Thu, Oct 17, 2013 at 1:20 PM, Ilari Liusvaara <
> ilari.liusvaara@elisanet.fi> wrote:
>
>> On Thu, Oct 17, 2013 at 11:11:48AM -0700, Roberto Peon wrote:
>> > On Thu, Oct 17, 2013 at 7:17 AM, Ilari Liusvaara <
>> > ilari.liusvaara@elisanet.fi> wrote:
>> >
>> > > On Wed, Oct 16, 2013 at 12:20:08PM -0700, Roberto Peon wrote:
>> > > > The first cut of Humman encoding is now included in the compression
>> > > > document.
>> > > >
>> > > > Please take a look and shout about the parts that you find
>> > > confusing/could
>> > > > be better expressed.
>> > >
>> > > - Lengths are marked as 8+. Are those 8 bit prefix or 0 bit prefix
>> > >   (IIRC, those were 0 bit in some past versions)?
>> > >
>> >
>> > Do you mean the lengths in the huffman table?
>>
>> No, the length fields that nominally contain the length of huffman-
>> encoded value (or name).
>>
>> > > - Is encountering a name/value without End-Of-String an error?
>> >
>> > - Is encountering a name/value with more bytes after EOS an error?
>> > >   * Appears to be string delimiter for values?
>> >
>> > We have a few options here.
>> > 2) Represent length of huffman-encoded strings as bytes and
>> >    a) include an EOF (or EOS :) ) terminal to indicate when the last
>> valid
>> > bit was read
>> >    b) pad the last byte with bits from one of the 8+ bit symbols
>> >    c) assume that the last character successfully decoded within the
>> bytes
>> > is the last one
>> >        intended, regardless of choice of a/b
>> >
>> > 2b suffers if we don't have characters like this
>>
>> You are guaranteed to have one, unless there are at most 128 source
>> symbols.
>>
>> And since this is to be 8-bit clean, there are going to be at least
>> 256(>128).
>>
>> Even just UTF-8 first bytes would need 182(>128).
>>
>> > 2c is probably a good idea regardless, as it means that the EOF/EOS is
>> > never going
>> > to require adding additional bytes to the encoded data.
>> >
>> > How about 2b+2c then (I'll need to regenerate the symbol tables, but
>> that
>> > shouldn't affect anyone)?
>>
>> Yeah, sounds good.
>>
>> Of course, this leaves the case where there is EOF/EOS in name field
>> (EOF/EOS in value field encodes multiple values, right?)
>>
>> -Ilari
>>
>
>

Received on Thursday, 17 October 2013 21:16:06 UTC