RE: Integer Representation in header-compression-draft-03 from Mike Bishop on 2013-10-18 (ietf-http-wg@w3.org from October to December 2013)

From: Mike Bishop <Michael.Bishop@microsoft.com>
Date: Fri, 18 Oct 2013 00:03:15 +0000
To: Roberto Peon <grmocg@gmail.com>, Patrick McManus <pmcmanus@mozilla.com>
CC: "Kulkarni, Saurabh" <sakulkar@akamai.com>, HTTP Working Group <ietf-http-wg@w3.org>
Message-ID: <aba7c4f5895f4f4eaf6eaaf38afca44f@BY2PR03MB025.namprd03.prod.outlook.com>
That works, though I think "8-bit" should be hyphenated.  Thanks for the quick turnaround!

From: Roberto Peon [mailto:grmocg@gmail.com]
Sent: Thursday, October 17, 2013 4:59 PM
To: Mike Bishop; Patrick McManus
Cc: Kulkarni, Saurabh; HTTP Working Group
Subject: Re: Integer Representation in header-compression-draft-03

+Patrick so hopefully he notices this.

I tried your suggestion, but found it jarring :/
I stuck more explanation in the integer encoding section, which now reads (I've italicized and made bold the additions):

Integers are used to represent name indexes, pair indexes or string lengths. To allow for optimized processing, an integer representation always finishes at the end of a byte.

An integer is represented in two parts: a prefix that fills the current byte and an optional list of bytes that are used if the integer value does not fit within the prefix. The number of bits of the prefix (called N) is a parameter of the integer representation.

The N-bit prefix allows filling the current byte. If the value is small enough (strictly less than 2-1), it is encoded within the N-bit prefix. Otherwise all the bits of the prefix are set to 1 and the value is encoded using an unsigned variable length integer<http://en.wikipedia.org/wiki/Variable-length_quantity>representation. N is always between 1 and 8 bits. An integer starting at a byte-boundary will have an 8 bit prefix.

The algorithm to represent an integer I is as follows:

...
How does that look?
-=R

On Thu, Oct 17, 2013 at 4:49 PM, Mike Bishop <Michael.Bishop@microsoft.com<mailto:Michael.Bishop@microsoft.com>> wrote:
I agree - an 8-bit prefix allows for more values to be in a single byte, so I'm not at all opposed to writing it in; we just need to be explicit.

Looking back at -03, 4.3.3 explicitly calls out a byte-aligned integer as being a "0-bit" prefix.  No other byte-aligned integer specifies a prefix length, hence my assumption (and presumably Patrick's).  That section has been removed in the current draft, since it's the definition of substitution, so we don't have to worry about reconciling it.  It would be good to explicitly state 8-bit prefix anywhere we reference a byte-aligned integer; 4.1.2 #1 is the only one I see off-hand.

From: Roberto Peon [mailto:grmocg@gmail.com<mailto:grmocg@gmail.com>]
Sent: Thursday, October 17, 2013 4:45 PM
To: Mike Bishop
Cc: Kulkarni, Saurabh; HTTP Working Group

Subject: Re: Integer Representation in header-compression-draft-03

I've integrated Fred's suggestion into the github spec version (i.e. N is always between 1 and 8)

Mike-- any suggestions on further clarification?

(imho, it is suboptimal to assume N=0, as you lose 127 points of codespace instead of only one.)
-=R

On Thu, Oct 17, 2013 at 4:41 PM, Mike Bishop <Michael.Bishop@microsoft.com<mailto:Michael.Bishop@microsoft.com>> wrote:
Looks like an interpretational difference that needs to be clarified, because Firefox looks exactly correct to me.

I had interpreted a field being "8+" bits long would be a zero-bit prefix integer.  (i.e. N=0, so the partial byte is absent, and you always have at least one byte which can represent numbers 0-127)  Certain instances explicitly call out zero-bit prefixes on byte boundaries, so I assumed they all were.  The spec needs to be consistent about whether integers starting on a byte boundary have an eight-bit or a zero-bit prefix, and an example would be good for this.

With a zero-bit prefix, that's the correct encoding for 159.  159 is 0b10011111.  You only get seven bits of value in the first byte because one is reserved for the continuation - which just happens to be the same bit that would be set if representing 159 on eight bits.  So the first byte is 0b10011111, followed by a second byte with the extra bit, 0b00000001.

From: Roberto Peon [mailto:grmocg@gmail.com<mailto:grmocg@gmail.com>]
Sent: Thursday, October 17, 2013 4:37 PM
To: Kulkarni, Saurabh
Cc: HTTP Working Group
Subject: Re: Integer Representation in header-compression-draft-03

Saurabh--

Thanks for this.
It looks like Firefox is getting this wrong, per my interpretation of what is supposed to happen here.
Indeed, though poorly specified, the intent is for the name-length and value-list-length fields, N is 8 since there are 8 bits available for length up to the next byte boundary, and so any value under 0xFF is (or should be) encodable on that byte.

-=R

On Thu, Oct 17, 2013 at 4:23 PM, Kulkarni, Saurabh <sakulkar@akamai.com<mailto:sakulkar@akamai.com>> wrote:
I was debugging my server (Akamai Ghost) with Firefox nightly for draft-06 and noticed a discrepancy with the way integer values are being represented in header compression. I shot an individual mail to Patrick just in case this is a false alarm, or people talked about this offline.

So header-compression-draft-03 says:
"The N-bit prefix allows filling the current byte. If the value is
 small enough (strictly less than 2^N-1), it is encoded within the
 N-bit prefix. Otherwise all the bits of the prefix are set to 1 and
 the value is encoded using an unsigned variable length integer [1]
 representation."

For representing lengths of header values the draft-03 says its 8+ meaning N=8. Which corresponds to <255 values can be encoded in 1 byte. But since the algorithm uses the MSB for signaling whether to consume the next byte, henceforth N needs to be 7. This is potentially confusing. I encountered this issue when I received a cookie value of length 159 which can potentially be encoded as 1/2 bytes (which is true to all values > 128 and < 255).

Firefox encoded this as: 159 = \159\001, but it can also be encoded as just \159.

Please clarify the text in the draft, because +/- 1 byte can throw-off the compressor completely for the subsequent values.

Thanks,
Saurabh
Received on Friday, 18 October 2013 00:03:48 UTC