Re: About draft-nottingham-http-pipeline-01.txt from Willy Tarreau on 2011-03-15 (ietf-http-wg@w3.org from January to March 2011)

From: Willy Tarreau <w@1wt.eu>
Date: Tue, 15 Mar 2011 11:43:12 +0100
To: Mark Nottingham <mnot@mnot.net>
Cc: HTTP Working Group <ietf-http-wg@w3.org>
Message-ID: <20110315104312.GA28995@1wt.eu>
Hi Mark,

On Mon, Mar 14, 2011 at 04:48:52PM -0700, Mark Nottingham wrote:
> > When those transparent proxies are specific to the site the client
> > is visiting, it can make sense to rely on Assoc-Req, because after
> > all, it's the site's admin responsibility to ensure that their
> > servers will correctly build the header. In fact there's another
> > issue on this point, let's see later.
> 
> Sorry, a transparent/intercepting proxy that's specific to a site? Are you talking about gateways ("reverse proxies") or are you saying that some networks are selectively deploying proxies that are only used for accessing certain sites?

I'm mostly thinking about load balancers, that are invisible to the client,
but HTTP optimizers (compressors, content aggregators) are also included.

> > But when the transparent proxies are on the client side, the header
> > basically brings no information.
> 
> The draft (clearly, I hope!) conveys a strategy for dealing with interception proxies; do you have any feedback on that? Assoc-Req is not intended to address all of the problems associated with them.

It's not clear to me when reading it. Maybe I missed an important point,
but the only relevant item I've found was about the fact that adding
the header is forbidden for proxies.

> > I see an easy solution to this : transparent proxies on the client
> > side will have to be modified to 1) remove any Assoc-Req header from
> > responses, and 2) forge it themselves to send a valid-looking response
> > to the client. However, this is contrary to what is specified in the
> > draft. Given the small time it takes to upgrade client-side proxies
> > at places such as ISPs, and given the noticeable benefits for every
> > internet site, I'm quite sure that every operator will do it whatever
> > is written in the spec.
> 
> I see your point about mobile networks -- they may want to optimise the connection between the browser and proxy, since they're more latency-sensitive than most other intermediary administrators. However, it seems to me that this is a specific client->proxy optimisation, not a general one that replaces the other mechanisms in the draft. 

I agree that it's specific to client->proxy, and because it deals with
latency, it's one place where pipelining is expected to show the most
important improvements if it can be generalized.

> Let me have a think about it and perhaps we can come up with something that helps pipelining to next-hop proxies without disturbing other use cases.

That would be the general idea. I don't intend to shake every thinking
that has led to this draft, but rather to fuel the discussion with new
elements. As you may remember, we've discussed about it in the past,
and letting it cool for a few months and taking into account what is
encountered in field helps figuring some new important points.

> > Now for the server side, we're suggesting adding the header on the
> > servers themselves in order to validate the whole chain. I see two
> > difficulties with that :
> >  - large sites take more time to be modified, even in order to
> >    add just a header ;
> 
> Do you have any data to back this up? In my experience, this is not trivial, but it is workable, especially when you dangle a substantial performance improvement as a carrot.

All the "large" sites I know are behind several layers of reverse-proxies
which are shared between multiple applications. I would even say application
components, because nowadays, no application administrator can define where
"the" application is, rather what server components it's made of. Most of
these components forward the requests they're not responsible for to a next
hop, a bit like portals. In these environments, dealing with URLs is generally
difficult because they're rewritten at many places, sometimes stripping the
first directory part at each level. For this reason we see hard-coded links
in pages and hard-coded Location headers for redirects, because noone is
able to build a correct one. I've header an architect once tell me that the
application was "relocalisable" in that it never knows its URLs, it's at the
end of a long chain and processes what it receives...

In practice, they often rely on the hosting infrastructure to serve error
pages because it's easier for them. In such environments, the amount of
efforts needed to get Assoc-Req right on every response is considerable,
and must be done for all hosted applications. On the opposite, doing it
on the first level of reverse proxy offers it a lot cheaper to all
applications.

> > Another point I'm seeing is on the efficiency and deployment speed.
> > I don't know how many sites there are on the net, but getting all of
> > the valid ones emit the header will take ages. We can relate that to
> > the number of sites which support keep-alive and HTTP compression.
> 
> Yet, strangely, many sites do deploy keep-alive and compression, and enjoy the benefits.

My observations on prod traffic at a few places tends to indicate that
many sites still using Apache 1.3 as a reverse proxy have to disable
keep-alive due to the pre-forked model. Also, while many sites do indeed
deploy compression, they still represent a very low percentage of what
can be found in large proxies' logs. I'm not dismissing the merits of
these two mechanisms, I just want to give an example of some improvements
that are not always deployed by some sites because they don't find an
immediate advantage for them while some clients would benefit from them
(eg: mobile users).

> > The main reason is that there is little incentive on the server side
> > to work on this, because the benefits are not directly perceived.
> 
> ?!?! I know of many server admins who salivate at the potential performance benefits that this brings. It's a huge incentive. 

On large sites it can lead to a dramatic reduction of the number of concurrent
connections, which is a good thing. But on small sites, this advantage is not
necessarily perceived.

> > That means that we can address most of the pipelining deployment issues
> > by targetting the client side and providing a real perceived benefit to
> > those who will deploy the feature, and it should concern more and more
> > internet users in very little time, because there are people willing to
> > push that mechanism forwards.
> 
> Yes, this is why I've been working with browser vendors, and as you may know, my employer has no small concern in assuring that its considerable array of content is delivered quickly.

But you agree that content providers as large as your employer are not that
many. If we want browsers to adopt pipelining by default, we should ensure
that many sites contribute to that effort, not just the 10 biggest ones.

> > On the architecture point of view, I'd say that if we want clients to
> > make efficient use of pipelining, we should only bother them with the
> > connections they're manipulating, it should not be end-to-end, because
> > they don't care what's on the other side of the proxies and they can't
> > do anything about that.
> 
> Pipelining can certainly be hop-by-hop, but head-of-line blocking is most often caused by the origin server. Therefore it's important to give it some control over the use of pipelining. 

I'm not sure what you mean here. Right now I know no intercepting proxy
which is able to forward pipelined requests. They accept pipelined requests,
but process one at a time. So the first proxy always terminates pipelining.

I've also encountered a heavily modified version of a well-known proxy
(sorry, my work is done under NDA, I can't disclose more) which supports
connection aggregation and optional pipelining when sending multiple
aggregated requests to a same server. So here again, pipeline client
requests may be split then possibly re-aggregated over existing
connections, and possibly pipelined with other concurrent requests. This
is an example of when pipelining between the proxy and the server might
happen regardless of the client's decision to pipeline or not. Where I've
seen this, the option was not enabled due to the usual issues with pipelining
on the net.

> > At minima, the header should be announced in the Connection header and
> > be emitted by any intermediary. That could ensure that the intermediary
> > closest to the client has the final word and that the client reliably
> > knows what it can do. It would also help a lot with the URL rewriting
> > issues, because most components involved in rewriting URLs are reverse
> > proxies. They would delete the header on the server side and rewrite it
> > on the client side.
> 
> This would require that intermediaries be rewritten and redeployed. I think your analysis WRT incentives is flawed; IME the majority of proxy administrators don't care about fine-tuning latency, they care about controlling access and/or reducing bandwidth use. 

I see your point. Gateways generally don't touch the headers specified in
Connection, so they should not have to be touched. Client side proxies are
generally very flexible and making them add a header is more a matter of
configuration than upgrade.

However I agree with you that for most client sites, there is little
incentive to make efforts to enable pipelining.

So based on this, I think we could summarize some points :
  - some server sites will have little incentive in adding the Assoc-Req
    headers in their servers when those servers have complex URL handling,
    and they don't always see an immediate benefit ;

  - some client sites will have little incentive in doing the job in their
    proxies (or upgrading them) in order to present this header to their
    clients for the very same reasons ;

  - some server sites will want to make most clients reliably enable
    pipelining in order to push the data as fast as possible outside ;

  - some client sites will want to make most of their clients reliably enable
    pipelining for any destination in order to reduce the effect of huge RTTs
    and large numbers of connections.

So maybe we could achieve something which is less aggressive than adding a
Connection header. Basically we could suggest how an intermediary should
proceed with the header if it wants to offer pipelining to all of its clients
(remove any Assoc-Req response header it receives from the server, and add one
by itself).

That would work both on server-side with reverse proxies and is just a matter
of configuration. That would also work on client side with proxies.

Also I'm thinking that we might want to improve on that by allowing explicit
proxies to set an "Assoc-Req: *" to indicate to their clients that they can
pipeline any request that is sent to them, regardless of the destination. Of
course, for this one the Connection header will have to be set, because we
only want that header to be delivered to the client when it's the next hop!
But allowing explicit proxies do to that would be of tremendous help and
would allow many clients that use their ISP's proxy to benefit from pipelining
for the entier internet. I'm even realizing that we don't need the "Assoc-Req:*"
in this case, sending "Connection: Assoc-Req" is enough to indicate to the
browser that it's supported and to get rid of any possible response at the
same time.

Best regards,
Willy
Received on Tuesday, 15 March 2011 10:43:46 UTC