Re: About draft-nottingham-http-pipeline-01.txt

On 15/03/2011, at 9:43 PM, Willy Tarreau wrote:
>> 
>> Sorry, a transparent/intercepting proxy that's specific to a site? Are you talking about gateways ("reverse proxies") or are you saying that some networks are selectively deploying proxies that are only used for accessing certain sites?
> 
> I'm mostly thinking about load balancers, that are invisible to the client,
> but HTTP optimizers (compressors, content aggregators) are also included.

Right. These aren't transparent or intercepting proxies, because they're deployed by the server. Presumably, when a site was testing to see how they support pipelining, they'd include testing these. 


>> The draft (clearly, I hope!) conveys a strategy for dealing with interception proxies; do you have any feedback on that? Assoc-Req is not intended to address all of the problems associated with them.
> 
> It's not clear to me when reading it. Maybe I missed an important point,
> but the only relevant item I've found was about the fact that adding
> the header is forbidden for proxies.

See:
  http://tools.ietf.org/html/draft-nottingham-http-pipeline-01#section-5


>>> Now for the server side, we're suggesting adding the header on the
>>> servers themselves in order to validate the whole chain. I see two
>>> difficulties with that :
>>> - large sites take more time to be modified, even in order to
>>>   add just a header ;
>> 
>> Do you have any data to back this up? In my experience, this is not trivial, but it is workable, especially when you dangle a substantial performance improvement as a carrot.
> 
> All the "large" sites I know are behind several layers of reverse-proxies
> which are shared between multiple applications. I would even say application
> components, because nowadays, no application administrator can define where
> "the" application is, rather what server components it's made of. Most of
> these components forward the requests they're not responsible for to a next
> hop, a bit like portals. In these environments, dealing with URLs is generally
> difficult because they're rewritten at many places, sometimes stripping the
> first directory part at each level. For this reason we see hard-coded links
> in pages and hard-coded Location headers for redirects, because noone is
> able to build a correct one. I've header an architect once tell me that the
> application was "relocalisable" in that it never knows its URLs, it's at the
> end of a long chain and processes what it receives...

Yes, I'm familiar with these as well.


> In practice, they often rely on the hosting infrastructure to serve error
> pages because it's easier for them. In such environments, the amount of
> efforts needed to get Assoc-Req right on every response is considerable,
> and must be done for all hosted applications. On the opposite, doing it
> on the first level of reverse proxy offers it a lot cheaper to all
> applications.

Yes, that's what I'd assume they'd do (insert assoc-req on the FE and perform due diligence to make sure it isn't messed up in the background). Assoc-req could also be used in those 'back end' hops, of course (with its payload changing each time the URL changed). 

I suspect this would also help improve security; IME different products act inconsistently when they rewrite the Host header.


>>> Another point I'm seeing is on the efficiency and deployment speed.
>>> I don't know how many sites there are on the net, but getting all of
>>> the valid ones emit the header will take ages. We can relate that to
>>> the number of sites which support keep-alive and HTTP compression.
>> 
>> Yet, strangely, many sites do deploy keep-alive and compression, and enjoy the benefits.
> 
> My observations on prod traffic at a few places tends to indicate that
> many sites still using Apache 1.3 as a reverse proxy have to disable
> keep-alive due to the pre-forked model. Also, while many sites do indeed
> deploy compression, they still represent a very low percentage of what
> can be found in large proxies' logs. I'm not dismissing the merits of
> these two mechanisms, I just want to give an example of some improvements
> that are not always deployed by some sites because they don't find an
> immediate advantage for them while some clients would benefit from them
> (eg: mobile users).

Sure. That doesn't mean that we should hold performance for the rest of the Web back.


>>> The main reason is that there is little incentive on the server side
>>> to work on this, because the benefits are not directly perceived.
>> 
>> ?!?! I know of many server admins who salivate at the potential performance benefits that this brings. It's a huge incentive. 
> 
> On large sites it can lead to a dramatic reduction of the number of concurrent
> connections, which is a good thing. But on small sites, this advantage is not
> necessarily perceived.

I'd love it on my personal site. *shrug*


>>> That means that we can address most of the pipelining deployment issues
>>> by targetting the client side and providing a real perceived benefit to
>>> those who will deploy the feature, and it should concern more and more
>>> internet users in very little time, because there are people willing to
>>> push that mechanism forwards.
>> 
>> Yes, this is why I've been working with browser vendors, and as you may know, my employer has no small concern in assuring that its considerable array of content is delivered quickly.
> 
> But you agree that content providers as large as your employer are not that
> many. If we want browsers to adopt pipelining by default, we should ensure
> that many sites contribute to that effort, not just the 10 biggest ones.

Of course. It would be helpful if you could try to characterise what makes the proposal more suited to large sites than small ones; I can't see anything that precludes them or biases away (assuming that browser and server vendors incorporate appropriate changes where necessary).


>>> On the architecture point of view, I'd say that if we want clients to
>>> make efficient use of pipelining, we should only bother them with the
>>> connections they're manipulating, it should not be end-to-end, because
>>> they don't care what's on the other side of the proxies and they can't
>>> do anything about that.
>> 
>> Pipelining can certainly be hop-by-hop, but head-of-line blocking is most often caused by the origin server. Therefore it's important to give it some control over the use of pipelining. 
> 
> I'm not sure what you mean here. Right now I know no intercepting proxy
> which is able to forward pipelined requests. They accept pipelined requests,
> but process one at a time. So the first proxy always terminates pipelining.

According to my testing, some proxies do handle pipelining, to various degrees, so "always" seems a bit overstated here.


> I've also encountered a heavily modified version of a well-known proxy
> (sorry, my work is done under NDA, I can't disclose more) which supports
> connection aggregation and optional pipelining when sending multiple
> aggregated requests to a same server. So here again, pipeline client
> requests may be split then possibly re-aggregated over existing
> connections, and possibly pipelined with other concurrent requests. This
> is an example of when pipelining between the proxy and the server might
> happen regardless of the client's decision to pipeline or not. Where I've
> seen this, the option was not enabled due to the usual issues with pipelining
> on the net.

Sure...


>>> At minima, the header should be announced in the Connection header and
>>> be emitted by any intermediary. That could ensure that the intermediary
>>> closest to the client has the final word and that the client reliably
>>> knows what it can do. It would also help a lot with the URL rewriting
>>> issues, because most components involved in rewriting URLs are reverse
>>> proxies. They would delete the header on the server side and rewrite it
>>> on the client side.
>> 
>> This would require that intermediaries be rewritten and redeployed. I think your analysis WRT incentives is flawed; IME the majority of proxy administrators don't care about fine-tuning latency, they care about controlling access and/or reducing bandwidth use. 
> 
> I see your point. Gateways generally don't touch the headers specified in
> Connection, so they should not have to be touched. Client side proxies are
> generally very flexible and making them add a header is more a matter of
> configuration than upgrade.
> 
> However I agree with you that for most client sites, there is little
> incentive to make efforts to enable pipelining.
> 
> So based on this, I think we could summarize some points :
>  - some server sites will have little incentive in adding the Assoc-Req
>    headers in their servers when those servers have complex URL handling,
>    and they don't always see an immediate benefit ;

I think that's a manageable problem, depending on some changes to the reverse proxy software. I believe that this will come soon after browsers support something, as there will be a strong incentive to do so. See also the request for feedback in the definition of assoc-req in -01, which may mitigate some concerns here.


>  - some client sites will have little incentive in doing the job in their
>    proxies (or upgrading them) in order to present this header to their
>    clients for the very same reasons ;

I'd broaden that to say there's little incentive for them to do anything, in many (not all) cases. In the long term, we can improve this by working with the Squid team, the Traffic Server team, commercial products, etc., but we can't assume that currently-deployed products will be upgraded quickly.


>  - some server sites will want to make most clients reliably enable
>    pipelining in order to push the data as fast as possible outside ;

Yes.


>  - some client sites will want to make most of their clients reliably enable
>    pipelining for any destination in order to reduce the effect of huge RTTs
>    and large numbers of connections.

Yes -- and this isn't addressed by the draft yet.


> So maybe we could achieve something which is less aggressive than adding a
> Connection header. Basically we could suggest how an intermediary should
> proceed with the header if it wants to offer pipelining to all of its clients
> (remove any Assoc-Req response header it receives from the server, and add one
> by itself).

Assoc-Req is all about making the client more comfortable that the response it's seeing is actually associated with the request it thinks it is. Removing it would be counter-productive; it's not an indicator that pipelining is desirable, etc. 

Having intermediaries add a header indicating that they want pipelining is one way to go about it, but I'm very aware that in the past, people have argued against a "I support pipelining" flag because a) HTTP/1.1 support is already this flag, and b) just as it is now, the first implementation that gets pipelining wrong will make that flag meaningless. 

Hmm. Perhaps a *new* hop-by-hop header that allows the client to associate a request with a response would be useful here; the client could generate the identifier, and since it's hop-by-hop, it wouldn't interfere with caching.

E.g.,

GET http://www.foo.com/ HTTP/1.1
Host: www.foo.com
Hop-ID: a
Connection: Hop-ID

HTTP/1.1 200 OK
Hop-ID: a
Connection: Hop-ID

The problem here is that clients don't have a strong incentive to send this header, unless they're sure that it's going to be useful. They could do so when they're configured to use a proxy, but AIUI the cases you're talking about involve intercepting proxies, so they couldn't be sure. 


> Also I'm thinking that we might want to improve on that by allowing explicit
> proxies to set an "Assoc-Req: *" to indicate to their clients that they can
> pipeline any request that is sent to them, regardless of the destination.

See above.

Cheers,

--
Mark Nottingham   http://www.mnot.net/

Received on Tuesday, 22 March 2011 03:12:41 UTC