RE: Header Compression Implementation Feedback from Mike Bishop on 2013-07-09 (ietf-http-wg@w3.org from July to September 2013)

From: Mike Bishop <Michael.Bishop@microsoft.com>
Date: Tue, 9 Jul 2013 15:00:39 +0000
To: Michael Sweet <msweet@apple.com>
CC: Amos Jeffries <squid3@treenet.co.nz>, "ietf-http-wg@w3.org" <ietf-http-wg@w3.org>
Message-ID: <67c8ad257f474933b9e076efed67972f@BY2PR03MB025.namprd03.prod.outlook.com>
No, you couldn't end up with that sequence as currently specced.  I agree, it would be an issue if the client did that, and you're right that there's no guarantee the individual HEADERS frame ends at a boundary -- but it's required to be the only frame type sent until a boundary is reached, and you are guaranteed to have reached a boundary at the end of the last HEADERS frame in the sequence.

>From http://tools.ietf.org/html/draft-ietf-httpbis-http2-04:
      A HEADERS frame without the END_HEADERS flag set MUST be followed
      by a HEADERS frame for the same stream.  A receiver MUST treat the
      receipt of any other type of frame or a frame on a different
      stream as a connection error (Section 5.4.1) of type
      PROTOCOL_ERROR.
...
   A compressed and encoded header block is transmitted in one or more
   HEADERS or PUSH_PROMISE frames.  If the number of octets in the block
   is greater than the space remaining in the frame, the block is
   divided into multiple fragments, which are then transmitted in
   multiple frames.

   Header blocks MUST be transmitted as a contiguous sequence of frames,
   with no interleaved frames of any other type, or from any other
   stream.  The last frame in a sequence of HEADERS frames MUST have the
   END_HEADERS flag set.  The last frame in a sequence of PUSH_PROMISE
   frames MUST have the END_PUSH_PROMISE flag set.
...
   An HTTP request or response each consist of:

   o  one contiguous sequence of HEADERS frames;

   o  zero or more DATA frames; and

   o  optionally, a contiguous sequence of HEADERS frames

   The last frame in the sequence bears an END_STREAM flag.

   Other frames, including HEADERS, MAY be interspersed with these
   frames, but those frames do not carry HTTP semantics.

   Trailing header fields are carried in a header block that also
   terminates the stream.  That is, a sequence of HEADERS frames that
   carries an END_STREAM flag on the last frame.  Header blocks after
   the first that do not terminate the stream are not part of an HTTP
   request or response.


-----Original Message-----
From: Michael Sweet [mailto:msweet@apple.com] 
Sent: Tuesday, July 9, 2013 7:28 AM
To: Mike Bishop
Cc: Amos Jeffries; ietf-http-wg@w3.org
Subject: Re: Header Compression Implementation Feedback

Mike,

On Jul 9, 2013, at 9:38 AM, Mike Bishop <Michael.Bishop@microsoft.com> wrote:
> No -- there's a single encoding and single decoding table *per connection* and this wouldn't change that.  Each table starts in an initial state with common values in each direction already in the table for convenient back-reference.
> 
> The problem with the initial state as it's currently defined is that it assumes clients only send request headers and servers only send response headers.  In fact, servers also send request headers (PUSH_PROMISE).
> 
> The suggestion is that both directions start with the same initial state.

I'm ok with that part.

However, I think we need to make it clear in the HTTP/2.0 specification that (partial) HEADER frames cannot be interleaved from multiple streams, otherwise the state of the header table will be undefined.  For example, let's say a (multithreaded) client wants to GET /image1.jpg and /image2.png from a server over a single HTTP/2.0 connection using streams 1 and 2, respectively. And let's say that, because of cookies used by the web site, we can't fit all of the request headers into a single HEADER frame.  If the client doesn't serialize the two requests you could end up with:

    1:HEADER  (first request frame on stream 1)
    2:HEADER  (first request frame on stream 2)
    2:HEADER  (second request frame on stream 2)
    1:HEADER  (second request frame on stream 1)
    2:HEADER  (last request frame on stream 2)
    1:HEADER  (last request frame on stream 1)

Since each frame cannot be guaranteed to end at a header boundary, the server reading those HEADER frames has no way to reliably update its copy of the header table used for decoding.  So this means the client needs to serialize HEADER frames (and thus any requests it sends) at a cost of memory and some complexity - whether this cost is greater than per-stream header tables will likely depend on the situation.  It will also limit how quickly a client can issue requests, but that isn't necessarily a bad thing and in most cases the limiting factor is network bandwidth, not CPU...  Oh, and implementations will need to add protection against receiving interleaved HEADER frames from multiple streams...

Similarly, servers have to serialize their outgoing HEADER frames and do the same sort of error checking for incoming HEADER frames.

If instead we use a per-stream header table then we avoid this complexity but end up with less effective compression, particularly if clients and servers do not reuse streams for multiple requests.  But personally I think the savings in connection setup time and simultaneous streaming of multiple requests and responses makes this loss in compression acceptable.  It would also simplify the protocol and implementation.


> -----Original Message-----
> From: Michael Sweet [mailto:msweet@apple.com]
> Sent: Tuesday, July 9, 2013 4:45 AM
> To: Amos Jeffries
> Cc: ietf-http-wg@w3.org
> Subject: Re: Header Compression Implementation Feedback
> 
> If you are suggesting that each endpoint should maintain a single encoding and a single decoding table per stream, I'm +1 on that.
> 
> Sent from my iPad
> 
> On 2013-07-09, at 1:13 AM, Amos Jeffries <squid3@treenet.co.nz> wrote:
> 
>> On 9/07/2013 12:01 p.m., James M Snell wrote:
>>> Another minor item as I've been going through the implementation:
>>> 
>>> 4. Right now, the Header Compression scheme assumes two separate 
>>> pre-filled header tables... one for Request headers, the other for 
>>> response headers. The challenge with this is that it does not 
>>> account for the use of Request Headers within PUSH_PROMISE frames. 
>>> This is minor right now, but it means that PUSH_PROMISE frames will 
>>> not have optimum compression because the request headers will need 
>>> to be added as Literal representations with Indexing. It would be 
>>> better if we just had ONE prefilled table (it would make 
>>> implementation generally easier as well)
>> 
>> +1.
>> 
>> Amos
>> 
> 
> 
> 
> 

_________________________________________________________
Michael Sweet, Senior Printing System Engineer, PWG Chair
Received on Tuesday, 9 July 2013 15:02:49 UTC