Re: About draft-nottingham-http-pipeline-01.txt

Sorry for the delay, Willy -- it's been busy.


On 22/03/2011, at 5:27 PM, Willy Tarreau wrote:

[snip]

>> 
>> I'd love it on my personal site. *shrug*
> 
> Because you precisely know what it saves. I've been fighting with some
> people at a mobile operator to try to show them that pipelining will
> definitely save both page load time and bandwidth (compared to using
> many hosts to serve pages). After many tests, they concluded that there
> was no gain at all and it was not worth trying to explain their users
> how to enabled it in their browser!

That's going to be an uphill battle no matter what we're talking about; operators don't want the support overhead of telling users to do *anything* unusual. This is the reason why interception proxies are so prevalent, unfortunately. 


> I *know* it can save a lot, but only if properly used. The only reason
> it did not save much there was because the first request on every new
> connection was never pipelined until the client saw HTTP/1.1 in the
> response. And since sites nowadays are composed of hundreds of objects
> fetched from tens of hosts, it's hard to heavily benefit from pipelining
> there, they prefer to open many parallel connections. Needless to say I
> was a bit frustrated, because I had not expected that at all :-/

I think that in time, if we can get pipelining deployed and a few other things working*, sites will find that they don't need to use so many servers / connections. We'll see.

* draft forthcoming :)


>>> But you agree that content providers as large as your employer are not that
>>> many. If we want browsers to adopt pipelining by default, we should ensure
>>> that many sites contribute to that effort, not just the 10 biggest ones.
>> 
>> Of course. It would be helpful if you could try to characterise what makes the proposal more suited to large sites than small ones; I can't see anything that precludes them or biases away (assuming that browser and server vendors incorporate appropriate changes where necessary).
> 
> The points where I think pipelining helps when properly used :
>  - it reduces page load time
>  - it reduces concurrent connection counts
>  - it saves bandwidth by limiting the number of segments and ACKs
> 
> First point is important to sites which take care of always serving pages
> very fast. Many sites I know are happy when their dynamic requests respond
> in less than 1 second... So saving a few RTTs there is not much of their
> concern (as I said, many are still running without keep-alive due to limits
> imposed by the components). However, big ones are very careful about the
> client's experience.

I'd say they don't do anything about it because they don't have the expertise / resources to tune to this level. Based on all of studies out there, people are more engaged when sites respond faster, and they learn better / spend more money / etc. as a result. If this can happen for the entire Web without much intervention by the site operator, so much the better.

As such, it's not that they don't care, it's that they don't know or can't do anything about it given the constraints they have. 


> The second point only impacts those who have to deploy a number of front
> servers which directly depends on the number of connections. If you find
> you're happy with an Apache and its 256 connection limit by default, you
> won't see what reducing the connection count will bring you. But if you
> can half the number of front servers just because of this, it's a lot
> different. Once again, only "large" sites are concerned with this point.
> I'd say any site which has to run a handful of front servers.

It also has impact on congestion, which affects everyone (even -- or especially -- non-HTTP apps)


> The last point will only concern sites which pay bandwidth usage. Many
> sites nowadays run on various link sizes where the cost doesn't change
> whether they use the link or not. The issue comes when they have to
> bump the link to a new offering. Right now, a large number of sites
> need less than 100 Mbps to run and scale for years to come, and those
> 100 Mbps are among the cheapest offerings that can be found around. So
> if a site is running at 50 Mbps and discovers that pipielining can save
> 5% bandwidth, they won't care. On the opposite, a site that peaks at
> 10Gig will always be interested to save 5%, because that's sometimes
> what can make their usage percentile lower, and save a bit on their
> operator's expense.

Yup, no argument there.


> Also, all those points' benefits will always take some time to achieve,
> because a lot of validation is necessary, sometimes even development.
> Large sites can invest a lot of time to save a few percent. I know some
> who even spend a lot of time trying to modify TCP stacks to save some
> packets. Smaller sites don't want to pay to save a few percent.
> 
>>>> Pipelining can certainly be hop-by-hop, but head-of-line blocking is most often caused by the origin server. Therefore it's important to give it some control over the use of pipelining. 
>>> 
>>> I'm not sure what you mean here. Right now I know no intercepting proxy
>>> which is able to forward pipelined requests. They accept pipelined requests,
>>> but process one at a time. So the first proxy always terminates pipelining.
>> 
>> According to my testing, some proxies do handle pipelining, to various degrees, so "always" seems a bit overstated here.
> 
> You've found proxies which do pipelining with the servers ? Nice. If
> you can send me a few pointers (even off-list if you prefer), I'm
> interested in giving a look at them just out of curiosity. In my
> opinion, if we find such proxies that also support aggregating client
> connections to the same server, then we have to add provisions for that
> in the draft, otherwise they'll be able to break regardless of the client
> checks : even if the client detects a faulty intermediary and decides not
> to pipeline anymore, it will not prevent the proxy from continuing to do
> so.

Will have to dig around, but yes.


>>> So based on this, I think we could summarize some points :
>>> - some server sites will have little incentive in adding the Assoc-Req
>>>   headers in their servers when those servers have complex URL handling,
>>>   and they don't always see an immediate benefit ;
>> 
>> I think that's a manageable problem, depending on some changes to the reverse proxy software. I believe that this will come soon after browsers support something, as there will be a strong incentive to do so. See also the request for feedback in the definition of assoc-req in -01, which may mitigate some concerns here.
> 
> I agree on this point. As I explained above, one of the showstoppers is
> "why bother with it, browsers won't use it anyway unless we explain to
> customers how to enable it by hand on a small bunch of them".
> 
>>> - some client sites will have little incentive in doing the job in their
>>>   proxies (or upgrading them) in order to present this header to their
>>>   clients for the very same reasons ;
>> 
>> I'd broaden that to say there's little incentive for them to do anything, in many (not all) cases. In the long term, we can improve this by working with the Squid team, the Traffic Server team, commercial products, etc., but we can't assume that currently-deployed products will be upgraded quickly.
> 
> That's always been my assumption too.
> 
>>> - some client sites will want to make most of their clients reliably enable
>>>   pipelining for any destination in order to reduce the effect of huge RTTs
>>>   and large numbers of connections.
>> 
>> Yes -- and this isn't addressed by the draft yet.
>> 
>>> So maybe we could achieve something which is less aggressive than adding a
>>> Connection header. Basically we could suggest how an intermediary should
>>> proceed with the header if it wants to offer pipelining to all of its clients
>>> (remove any Assoc-Req response header it receives from the server, and add one
>>> by itself).
>> 
>> Assoc-Req is all about making the client more comfortable that the response it's seeing is actually associated with the request it thinks it is. Removing it would be counter-productive; it's not an indicator that pipelining is desirable, etc. 
> 
> When I mean "remove it", it's in order to replace it to avoid duplicates etc...
> That would go that way, with a controlled client-side intercepting proxy :
> 
> Client           int.proxy                  whatever                server
>   GET /foo                GET /foo                GET /app1/foo
>   --------------->      ------------------->     -------------------->
> 
>   Assoc-Req: /foo        nothing or whatever      nothing or Assoc-Req: /app1/foo
>   <---------------      <-------------------     <--------------------
> 
> If this proxy does not make use of pipelining to talk to the origin server,
> it does not care about the lack of Assoc-Req in the response, nor about
> possibly faulty Assoc-Req values due to improper handling on the server
> side. However what it cares about is the fact that it informs the client
> what request the response is related to. So it can reliably remove any
> occurrence of the response header and insert its own. That way the client
> still validates that its pipelined requests are correctly processed.

Do you really mean a client-side intercepting proxy? I can see the scenario above working for a reverse proxy / accelerator, but a client-side proxy that rewrites URLs is a really unfriendly beast anyway...


>> Having intermediaries add a header indicating that they want pipelining is one way to go about it, but I'm very aware that in the past, people have argued against a "I support pipelining" flag because a) HTTP/1.1 support is already this flag, and b) just as it is now, the first implementation that gets pipelining wrong will make that flag meaningless. 
> 
> I agree, but here we're not saying "I support pipelining", we're giving
> a piece of information to the browser so that it can decide whether we
> correctly support pipelining or not, which is the principle of the Assoc-Req
> header : don't trust what the server is saying, check it. Here for the
> client, the server is the intercepting proxy, so that works exactly as
> it should have done on the origin server.
> 
> When the proxy is an explicit one, it should be the same principle. Also,
> a proxy (intercepting or explicit) which supports pipelining with the
> server must obviously check the assoc-req response header to know if it
> can still pipeline there.

Generally agreed, but see below.


>> Hmm. Perhaps a *new* hop-by-hop header that allows the client to associate a request with a response would be useful here; the client could generate the identifier, and since it's hop-by-hop, it wouldn't interfere with caching.
>> 
>> E.g.,
>> 
>> GET http://www.foo.com/ HTTP/1.1
>> Host: www.foo.com
>> Hop-ID: a
>> Connection: Hop-ID
>> 
>> HTTP/1.1 200 OK
>> Hop-ID: a
>> Connection: Hop-ID
>> 
>> The problem here is that clients don't have a strong incentive to send this header, unless they're sure that it's going to be useful. They could do so when they're configured to use a proxy, but AIUI the cases you're talking about involve intercepting proxies, so they couldn't be sure. 
> 
> We've talked about that in the past, and to be fair, I think it would
> be the most flexible solution because it could be deployed by steps
> and progressively be used everywhere (just as keep-alive was deployed).
> Also I think it is compatible with gateways that don't remove the headers
> listed in Connection, because that leaves the ability for the browser to
> check if the communication to the next visible hop is OK. But IIRC you
> said that browser vendors really want to limit the amount of uploaded
> bytes and that they'd avoid to send a header that has limited use.
> 
> Can't we simply put a value in the Connection header BTW ?
> Eg: Connection: r=12 ?


Connection only takes tokens, not key=value pairs.


A fair amount of time has passed since the first version (or even most recent version!) of the draft, and in my conversations with vendors -- especially Moz's Patrick McManus -- I've come to realise that the draft is probably too conservative. I.e., There's a desire to have pipelining on by default, without any opt-in or special mechanisms from the server, using heuristics to back off if a problem is encountered.

In this approach, Assoc-Req et al are still uesful, but only as hints / aids to the heuristics, not as an opt-in. I need to revise the draft accordingly, I think.

Cheers,






--
Mark Nottingham   http://www.mnot.net/

Received on Tuesday, 26 April 2011 06:56:46 UTC