Re: HPACK opcode bit patterns from Michael Sweet on 2014-08-06 (ietf-http-wg@w3.org from July to September 2014)

From: Michael Sweet <msweet@apple.com>
Date: Wed, 06 Aug 2014 11:12:30 -0400
To: Jason Greene <jason.greene@redhat.com>
Cc: David Krauss <potswa@gmail.com>, HTTP Working Group <ietf-http-wg@w3.org>
Message-id: <86C98AE1-C665-4B1C-A648-59EA1D92CC40@apple.com>

Jason,

On Aug 6, 2014, at 10:30 AM, Jason Greene <jason.greene@redhat.com> wrote:
> 
> On Aug 6, 2014, at 7:12 AM, Michael Sweet <msweet@apple.com> wrote:
> 
>> David,
>> 
>> On Aug 6, 2014, at 6:58 AM, David Krauss <potswa@gmail.com> wrote:
>>> ...
>>> Also, an implementation can treat “without indexing” as “never indexed” and use a common code path, since the encodings are identical modulo a don’t-care bit. Wouldn’t be surprised if that was deliberate.
>> 
>> That's the approach I am taking, and frankly I wouldn't cry if we just did away with the semantic difference and just had "never indexed".  While I don't think the extra bit will make a huge difference in compression ratio, it would simplify all implementations and it puts the onus of efficient encoding on the sender (vs. intermediaries like proxies).
> 
> If a proxy coalesces connections its encoding context can be radically different than the clients. A field sent by one client might already be in the state table from another client, and so an optimal hpack encoder would notice that and reuse that index, even if it was sent without indexing (no reason not to). So its important for a proxy to know the difference between a client which was simply optimizing its encoding context, and a header that should never be indexed.

With my current experience in implementing HTTP/2, I'm currently encoding the following headers using one of the non-indexed forms:

- Authorization (never indexed)
- Content-Length (without indexing)
- Content-Location (without indexing)
- Content-MD5 (without indexing)
- Content-Range (without indexing)
- Content-Version (without indexing)
- Date (without indexing)
- If-Modified-Since (without indexing)
- If-Unmodified-Since (without indexing)
- Last-Modified (without indexing)
- Link (without indexing)
- Location (without indexing)
- Range (without indexing)
- Retry-After (without indexing)

My general rule for "without indexing" is "anything that will likely change between requests/responses", which includes date/time fields, hashes, locations, ranges, byte counts, etc.  And of course authorization data is "never indexed".

While I agree that an intermediary like a proxy MIGHT be able to more efficiently re-encode a client's provided headers, and that it is already re-encoding when coalescing connections to a server, I am *not* convinced that it will save more over the wire than adding a bit for the index into the header table will.  And certainly if you want to make the "right" choices for that re-encoding you need to adjust those choices based on the trends you see in the requests you proxy and not on a simplistic "index everything I can" strategy since that WILL cause header table thrashing for all but the simplest clients.

_________________________________________________________
Michael Sweet, Senior Printing System Engineer, PWG Chair

Attachments

application/pkcs7-signature attachment: smime.p7s

Received on Wednesday, 6 August 2014 15:13:01 UTC