Re: Fragmentation for headers: why jumbo != continuation. from Willy Tarreau on 2014-07-10 (ietf-http-wg@w3.org from July to September 2014)

From: Willy Tarreau <w@1wt.eu>
Date: Thu, 10 Jul 2014 23:00:12 +0200
To: Roberto Peon <grmocg@gmail.com>
Cc: HTTP Working Group <ietf-http-wg@w3.org>
Message-ID: <20140710210012.GB7762@1wt.eu>

Hi Roberto,

On Thu, Jul 10, 2014 at 01:27:01PM -0700, Roberto Peon wrote:
> There are two separate reasons to fragment headers
> 
> 1) Dealing with headers of size > X when the max frame-size is <= X.
> 2) Reducing buffer consumption and latency.
> 
> Most of the discussion thus far has focused on #1.
> I'm going to ignore it, as those discussions are occurring elsewhere, and
> in quite some depth :)
> 
> 
> I wanted to be sure we were also thinking about #2.
> 
> Without the ability to fragment headers on the wire, one must know the size
> of the entire set of headers before any of it may be transmitted.
> 
> This implies that one must encode the entire set of headers before sending
> if one will ever do transformation of the headers. Encoding the headers in
> a different HPACK context would count as a transformation, even if none of
> the headers were modified.
> 
> This means that the protocol, if it did not have the ability to fragment,
> would require increased buffering and increased latency for any proxy by
> design.
> 
> This is not currently true for HTTP/1-- the headers can be sent/received in
> a streaming fashion, and implementations may, at their option, choose to
> buffer in order to simplify code.

Well, while implementations may do it in HTTP/1 at their option, in practice
it's only used by low-end hacks which do write(fd, string, length) and write
one header at a time. Indeed, doing so with TCP_NODELAY results in as many
packets with a PUSH flag as write() calls, which is absolutely suboptimal.

Also I tend to see it quite differently : the microsecond which is needed
to process all headers at once is very low compared to the extra work the
receiver has to do to retrieve the context of the request currently being
parsed for each chunk of header received. That's one ugliness of the
continuations frames in my opinion. I expect that a number of implementations
will very likely read each frame, convert it to HTTP/1 and apply existing
parser on the H/1 result, to discover that it's incomplete. And they'll
rebuild and re-parse the whole incomplete H/1 request after each incoming
frame. Just send one header at a time and you get an O(N^2) parsing.

Do you have numbers showing the extra latency you'd expect from passing
headers at once ? As I reported in an earlier e-mail, I'm seeing in the
order of 1 microsecond to parse a whole HTTP/1 request, and I hope that
H/2 will not increase that by orders of magnitude. Currently, operating
systems incur a much larger granularity, even a switch port gives you a
much higher granularity. I'd suspect that if you *measure* the difference
between headers-at-once and one-at-a-time, you'd get a faster sending
with all at once thanks to lower processing overhead and network overhead.

Just my two cents,
Willy

Received on Thursday, 10 July 2014 21:00:37 UTC