Re: Integer Representation in header-compression-draft-03 from Patrick McManus on 2013-10-18 (ietf-http-wg@w3.org from October to December 2013)

From: Patrick McManus <pmcmanus@mozilla.com>
Date: Thu, 17 Oct 2013 21:26:23 -0400
To: Roberto Peon <grmocg@gmail.com>
Cc: Fred Akalin <akalin@google.com>, "Kulkarni, Saurabh" <sakulkar@akamai.com>, Mike Bishop <Michael.Bishop@microsoft.com>, HTTP Working Group <ietf-http-wg@w3.org>
Message-ID: <CAOdDvNpSZbqAeqKMu_JysFPDftnPo-KnzD-VkQdJD=6k=em0AA@mail.gmail.com>
wow guys - that's a lot of mail over one dinner break!


Thanks for the clarification.. I'll adjust the firefox builds in the next
day or two. Good news about an imminent ghost implementation!

-P



On Thu, Oct 17, 2013 at 8:26 PM, Roberto Peon <grmocg@gmail.com> wrote:

> I think the spec is clear about the subsequent bytes, both by specifying
> that they're var-int encoded, and via the pseudocode.
>
> -=R
>
>
> On Thu, Oct 17, 2013 at 5:17 PM, Fred Akalin <akalin@google.com> wrote:
>
>> N only applies to the first (possibly partial) byte, which is treated
>> specially (since it has a different continuation pattern). It would be
>> incorrect to have an "N" for subsequent bytes.
>>
>>
>> On Thu, Oct 17, 2013 at 5:14 PM, Kulkarni, Saurabh <sakulkar@akamai.com>wrote:
>>
>>> Note that even though N=8 is good, do we need to explicitly mention that
>>> the subsequent bytes are N=7? Because in essence we are using the MSB for
>>> encoding the traversal for next byte. Not sure whether that will make it
>>> clearer or more confusing tho.
>>>
>>> - Saurabh
>>>
>>> From: Mike Bishop <Michael.Bishop@microsoft.com>
>>> Date: Thursday, October 17, 2013 5:03 PM
>>> To: Roberto Peon <grmocg@gmail.com>, Patrick McManus <
>>> pmcmanus@mozilla.com>
>>> Cc: Saurabh Kulkarni <sakulkar@akamai.com>, HTTP Working Group <
>>> ietf-http-wg@w3.org>
>>> Subject: RE: Integer Representation in header-compression-draft-03
>>>
>>> That works, though I think “8-bit” should be hyphenated.  Thanks for the
>>> quick turnaround!****
>>>
>>> ** **
>>>
>>> *From:* Roberto Peon [mailto:grmocg@gmail.com <grmocg@gmail.com>]
>>> *Sent:* Thursday, October 17, 2013 4:59 PM
>>> *To:* Mike Bishop; Patrick McManus
>>> *Cc:* Kulkarni, Saurabh; HTTP Working Group
>>> *Subject:* Re: Integer Representation in header-compression-draft-03****
>>>
>>> ** **
>>>
>>> +Patrick so hopefully he notices this.****
>>>
>>> ** **
>>>
>>> I tried your suggestion, but found it jarring :/****
>>>
>>> I stuck more explanation in the integer encoding section, which now
>>> reads (I've italicized and made bold the additions):****
>>>
>>> Integers are used to represent name indexes, pair indexes or string
>>> lengths. To allow for optimized processing, an integer representation
>>> always finishes at the end of a byte.****
>>>
>>> An integer is represented in two parts: a prefix that fills the current
>>> byte and an optional list of bytes that are used if the integer value does
>>> not fit within the prefix. The number of bits of the prefix (called N) is a
>>> parameter of the integer representation.****
>>>
>>> The N-bit prefix allows filling the current byte. If the value is small
>>> enough (strictly less than 2-1), it is encoded within the N-bit prefix.
>>> Otherwise all the bits of the prefix are set to 1 and the value is encoded
>>> using an *unsigned variable length integer*<http://en.wikipedia.org/wiki/Variable-length_quantity>
>>> representation.* N is always between 1 and 8 bits. An integer starting
>>> at a byte-boundary will have an 8 bit prefix.*****
>>>
>>> The algorithm to represent an integer I is as follows:****
>>>
>>> ...****
>>>
>>> How does that look?****
>>>
>>> -=R****
>>>
>>> ** **
>>>
>>> On Thu, Oct 17, 2013 at 4:49 PM, Mike Bishop <
>>> Michael.Bishop@microsoft.com> wrote:****
>>>
>>> I agree – an 8-bit prefix allows for more values to be in a single byte,
>>> so I’m not at all opposed to writing it in; we just need to be explicit.
>>> ****
>>>
>>>  ****
>>>
>>> Looking back at -03, 4.3.3 explicitly calls out a byte-aligned integer
>>> as being a “0-bit” prefix.  No other byte-aligned integer specifies a
>>> prefix length, hence my assumption (and presumably Patrick’s).  That
>>> section has been removed in the current draft, since it’s the definition of
>>> substitution, so we don’t have to worry about reconciling it.  It would be
>>> good to explicitly state 8-bit prefix anywhere we reference a byte-aligned
>>> integer; 4.1.2 #1 is the only one I see off-hand.****
>>>
>>>  ****
>>>
>>> *From:* Roberto Peon [mailto:grmocg@gmail.com]
>>> *Sent:* Thursday, October 17, 2013 4:45 PM
>>> *To:* Mike Bishop
>>> *Cc:* Kulkarni, Saurabh; HTTP Working Group****
>>>
>>>
>>> *Subject:* Re: Integer Representation in header-compression-draft-03****
>>>
>>>  ****
>>>
>>> I've integrated Fred's suggestion into the github spec version (i.e. N
>>> is always between 1 and 8)****
>>>
>>>  ****
>>>
>>> Mike-- any suggestions on further clarification?****
>>>
>>>  ****
>>>
>>> (imho, it is suboptimal to assume N=0, as you lose 127 points of
>>> codespace instead of only one.)****
>>>
>>> -=R****
>>>
>>>  ****
>>>
>>> On Thu, Oct 17, 2013 at 4:41 PM, Mike Bishop <
>>> Michael.Bishop@microsoft.com> wrote:****
>>>
>>> Looks like an interpretational difference that needs to be clarified,
>>> because Firefox looks exactly correct to me.****
>>>
>>>  ****
>>>
>>> I had interpreted a field being “8+” bits long would be a zero-bit
>>> prefix integer.  (i.e. N=0, so the partial byte is absent, and you always
>>> have at least one byte which can represent numbers 0-127)  Certain
>>> instances explicitly call out zero-bit prefixes on byte boundaries, so I
>>> assumed they all were.  The spec needs to be consistent about whether
>>> integers starting on a byte boundary have an eight-bit or a zero-bit
>>> prefix, and an example would be good for this.****
>>>
>>>  ****
>>>
>>> With a zero-bit prefix, that’s the correct encoding for 159.  159 is
>>> 0b10011111.  You only get seven bits of value in the first byte because one
>>> is reserved for the continuation – which just happens to be the same bit
>>> that would be set if representing 159 on eight bits.  So the first byte is
>>> 0b10011111, followed by a second byte with the extra bit, 0b00000001.***
>>> *
>>>
>>>  ****
>>>
>>> *From:* Roberto Peon [mailto:grmocg@gmail.com]
>>> *Sent:* Thursday, October 17, 2013 4:37 PM
>>> *To:* Kulkarni, Saurabh
>>> *Cc:* HTTP Working Group
>>> *Subject:* Re: Integer Representation in header-compression-draft-03****
>>>
>>>  ****
>>>
>>> Saurabh--****
>>>
>>>  ****
>>>
>>> Thanks for this.****
>>>
>>> It looks like Firefox is getting this wrong, per my interpretation of
>>> what is supposed to happen here.****
>>>
>>> Indeed, though poorly specified, the intent is for the name-length and
>>> value-list-length fields, N is 8 since there are 8 bits available for
>>> length up to the next byte boundary, and so any value under 0xFF is (or
>>> should be) encodable on that byte.****
>>>
>>>  ****
>>>
>>> -=R****
>>>
>>>  ****
>>>
>>> On Thu, Oct 17, 2013 at 4:23 PM, Kulkarni, Saurabh <sakulkar@akamai.com>
>>> wrote:****
>>>
>>> I was debugging my server (Akamai Ghost) with Firefox nightly for
>>> draft-06 and noticed a discrepancy with the way integer values are being
>>> represented in header compression. I shot an individual mail to Patrick
>>> just in case this is a false alarm, or people talked about this offline.
>>> ****
>>>
>>>  ****
>>>
>>> So header-compression-draft-03 says:****
>>>
>>> "The N-bit prefix allows filling the current byte. If the value is****
>>>
>>>  small enough (strictly less than 2^N-1), it is encoded within the****
>>>
>>>  N-bit prefix. Otherwise all the bits of the prefix are set to 1 and****
>>>
>>>  the value is encoded using an unsigned variable length integer [1]****
>>>
>>>  representation."****
>>>
>>>  ****
>>>
>>> For representing lengths of header values the draft-03 says its 8+
>>> meaning N=8. Which corresponds to <255 values can be encoded in 1 byte. But
>>> since the algorithm uses the MSB for signaling whether to consume the next
>>> byte, henceforth N needs to be 7. This is potentially confusing. I
>>> encountered this issue when I received a cookie value of length 159 which
>>> can potentially be encoded as 1/2 bytes (which is true to all values > 128
>>> and < 255). ****
>>>
>>>  ****
>>>
>>> Firefox encoded this as: 159 = \159\001, but it can also be encoded as
>>> just \159.****
>>>
>>>  ****
>>>
>>> Please clarify the text in the draft, because +/- 1 byte can throw-off
>>> the compressor completely for the subsequent values.****
>>>
>>>  ****
>>>
>>> Thanks,****
>>>
>>> Saurabh****
>>>
>>>  ****
>>>
>>>  ****
>>>
>>>  ****
>>>
>>> ** **
>>>
>>
>>
>
Received on Friday, 18 October 2013 01:26:51 UTC