Re: I-D Action:draft-nottingham-http-pipeline-00.txt from Willy Tarreau on 2010-09-08 (ietf-http-wg@w3.org from July to September 2010)

From: Willy Tarreau <w@1wt.eu>
Date: Wed, 8 Sep 2010 07:41:38 +0200
To: Mark Nottingham <mnot@mnot.net>
Cc: Patrick McManus <mcmanus@ducksong.com>, HTTP Working Group <ietf-http-wg@w3.org>
Message-ID: <20100908054138.GD8811@1wt.eu>
Hi Mark,

On Wed, Sep 08, 2010 at 10:34:42AM +1000, Mark Nottingham wrote:
> 
> On 01/09/2010, at 12:33 AM, Patrick McManus wrote:
> >> False positives are a concern, but at worst pipelining wouldn't work
> >> in those cases; i.e., the Web site still would, so it would be a
> >> "soft" failure.
> > 
> > well, the transaction presumably has to be redone in a non pipelined
> > context so there is a real cost to the false positive. But yes, it
> > doesn't really break anything so the only important question is how
> > often would it happen? My gut says it would happen a whole lot more
> > often than actual pipelining errors happen - especially because server
> > operators wouldn't have good visibility into it even occurring and it
> > seems the conditions for the origin-server and the user-agent to have
> > different views of the URI are common enough.
> 
> True. However, it would happen once per server per (however long the browser can remember it; default is session, I suppose). 
> 
> Also, remember that this would only happen for those servers that choose to deploy the header, but get it wrong somehow (i.e., they don't adequately test). So, good testing / verification tools (which would be dead simple to produce) would mitigate this.

My fears are related to the way admins test their deployment. When I
see the number of times we encounter incorrectly rewritten Location
headers and the number of incorrect Apache rewrite rules, I'm pretty
sure that most of the times, this header will be wrong and since it
will not have a measurable effect, noone will notice it nor fix it.
It is even possible that some browsers will finally try to "fix" what
they get in the response to try to match what the sent (eg: strip
double slashes, ignore the host part, etc...).

> Finally, in most situations where URLs get munged, the appropriate place to generate this header is where the URL munging happens (i.e., in the gateway / reverse proxy / surrogate / CDN / accelerator / whatever they're calling them today). 
> 
> I agree it's not perfect (by a long shot), but I'm at a loss to come up with another way to do this that improves the situation *and* is compatible with the deployed architecture (i.e., doesn't require HTTP/2.0). 
> 
> One more thought -- one way to partially address these concerns would be to remove information from assoc-req; e.g., to only echo the path back, omitting the hostname/port, so that people who rewrite them won't get caught. This doesn't feel like the right thing to do to me, however.

That's where I think that a short MD5 of the request would to the trick.
The client could emit it (just a few bytes, not the complete MD5), and
the other end (server, reverse-proxy, load balancer, etc... in fact the
first intermediary that knows it supports pipelining on the client side
but not the server side) would remove it from the passed on request and
echo it back. The two main advantages I see :

  - we're certain to get only one respose back because it's presented
    only when the header is seen in the request ;

  - it depends on the URL even if the intermediaries don't know the
    URL the client was requesting due to rewrites ;

And since you said it would be needed at the beginning of the requests
only (eg: once per Host header), sending just a few bytes in each first
request finally requires less bandwidth because once the client gets its
response, it doesn't need to request that anymore.

So we'd get something like this :

  Browser                               Server

GET /foo.html HTTP/1.1
Host: example.com
Assoc-Req: e62004f5    ------->
                                     HTTP/1.1 200 OK
                                     Content-length: 2346
                                     Content-type: text/html
                                     Assoc-Req: e62004f5
                       <-------

GET /img/foo1.png HTTP/1.1
Host: example.com
Referer: http://example.com/foo.html
Assoc-Req: 5b5e86b8

GET /img/foo2.png HTTP/1.1
Host: example.com
Referer: http://example.com/foo.html
Assoc-Req: 394e6c1c

GET /img/foo3.png HTTP/1.1
Host: example.com
Referer: http://example.com/foo.html
Assoc-Req: a69add09   --------->
                                     HTTP/1.1 200 OK
                                     Content-length: 151
                                     Content-type: image/png
                                     Assoc-Req: 5b5e86b8

                                     HTTP/1.1 200 OK
                                     Content-length: 152
                                     Content-type: image/png
                                     Assoc-Req: 394e6c1c

                                     HTTP/1.1 200 OK
                                     Content-length: 153
                                     Content-type: image/png
                                     Assoc-Req: a69add09
                       <-------

After the first request, the browser already knows it's worth trying
to pipeline. After the pipelined requests, the browser has a confirmation
that the server really supports pipelining. At this point it can simply
stop emitting the header for this host.

Regards,
Willy
Received on Wednesday, 8 September 2010 05:42:13 UTC