- From: Willy Tarreau <w@1wt.eu>
- Date: Mon, 14 Mar 2011 20:21:34 +0100
- To: Mark Nottingham <mnot@mnot.net>
- Cc: HTTP Working Group <ietf-http-wg@w3.org>
Hello Mark, First, thanks for this update on your proposal. After having spent some time thinking on the issue, and a bit of experience at a mobile phone operator where transparent proxies are everywhere on the client side, I really think that the current proposal will not make the situation significantly improve, despite the initially good idea. Please let me explain. Pipelining issues are connection-specific : a client may decide to pipeline or not to pipeline over a given connection. The client does not know whether there are transparent proxies in the chain (or "interception proxies", let's call them all "transparent" for the sake of easier explanation). When those transparent proxies are specific to the site the client is visiting, it can make sense to rely on Assoc-Req, because after all, it's the site's admin responsibility to ensure that their servers will correctly build the header. In fact there's another issue on this point, let's see later. But when the transparent proxies are on the client side, the header basically brings no information. In fact, either all sites work, or all of them fail. If the transparent proxy is unable to cope with pipelining, granted some sites will be blacklisted from time to time, but it will still not be possible to blacklist all connections passing through this reverse-proxy. However, when the transparent proxy supports pipelining, it will not be used for the vast majority of sites which do not make use of the header, while it's unrelated to them. And for the few ones that will emit the header reporting an issue between the transparent proxy and the site itself, the client will refrain from pipelining while it's unrelated to its side. I see an easy solution to this : transparent proxies on the client side will have to be modified to 1) remove any Assoc-Req header from responses, and 2) forge it themselves to send a valid-looking response to the client. However, this is contrary to what is specified in the draft. Given the small time it takes to upgrade client-side proxies at places such as ISPs, and given the noticeable benefits for every internet site, I'm quite sure that every operator will do it whatever is written in the spec. Now for the server side, we're suggesting adding the header on the servers themselves in order to validate the whole chain. I see two difficulties with that : - large sites take more time to be modified, even in order to add just a header ; - it's more and more common on the server side to "route" requests via various layers of application proxies and servers, and URLs are rewritten, mapped, prefixed, or have their prefixes stripped. It's enough to see the damage done on the Location response header to understand what type of molestation they support on the request path. In these environments, the cost of adding an Assoc-Req header will be high compared to the perceived benefits (application authors test them on the local network anyway). Thus it will be difficult (read expensive) to reliably add this header on the server side for a little perceived benefit. Also, it will be enough to get it wrong only once for the site to be blacklisted by many clients anyway, rendering the efforts useless. Also, there will be issues with duplicate headers, URLs containing commas being handled differently by some clients when presented in the Assoc-Req header, etc... Another point I'm seeing is on the efficiency and deployment speed. I don't know how many sites there are on the net, but getting all of the valid ones emit the header will take ages. We can relate that to the number of sites which support keep-alive and HTTP compression. The main reason is that there is little incentive on the server side to work on this, because the benefits are not directly perceived. On the other hand, there are much less network operators than there are web sites. When one operator gets the header right in his proxies, those are millions of users which instantly get pipelining on the whole internet. A few months ago I would have said that pipelining was mostly useful between the client side and the server side. Having seen the huge RTTs that can be experienced by a mobile phone, I now know that the most important location is there, between the mobile phone and the operator's. And operators have a real incentive to reduce their customers' page load time, it's a strong competitive argument. So whatever they can do, they will do. And specifically, they're systematically setting up transparent proxies in order to cut their customers' RTT and optimize their TCP behaviour. >From what I observe around me, internet access grows faster via mobile phones than via DSL. I may be wrong but I'm pretty sure that some statistics would back me. That means that we can address most of the pipelining deployment issues by targetting the client side and providing a real perceived benefit to those who will deploy the feature, and it should concern more and more internet users in very little time, because there are people willing to push that mechanism forwards. On the architecture point of view, I'd say that if we want clients to make efficient use of pipelining, we should only bother them with the connections they're manipulating, it should not be end-to-end, because they don't care what's on the other side of the proxies and they can't do anything about that. At minima, the header should be announced in the Connection header and be emitted by any intermediary. That could ensure that the intermediary closest to the client has the final word and that the client reliably knows what it can do. It would also help a lot with the URL rewriting issues, because most components involved in rewriting URLs are reverse proxies. They would delete the header on the server side and rewrite it on the client side. It still complies with your goal of spotting the baddest intermediaries, because the ones not processing the Connection header will simply pass the responses as they get them, and if they're the last element in the response path before the client and they fail their pipelining, they will be detected. I really think that encouraging proxies (explicit and transparent) to add the header themselves would really speed up adoption and have the expected benefit you're looking for. Also, one point I'm thinking about. I noticed a situation where pipelining did not bring any advantage over the fact that some sites use a large number of domain names for their hosts. The reason is that the first request to a server is not pipelined, so if a client has to fetch 100 objects over 50 connections, it does only 2 non-pipelined requests over each (and I'm not making up numbers, I've seen that). Ideally we should find a solution so that a proxy (explicit or transparent) can indicate to a client that it supports pipelining for whatever site the client wants to access. That way the client will be able to make effective use of its connections to pipeline all requests, starting from the first ones. I think that doing so with an explicit proxy configuration is easy, we could say that if a client is configured to use a proxy and it gets an Assoc-Req response header, then it knows that all connections to the same proxy can be pipelined. (For the transparent case, it's tougher because I see no way to tell the client without risking to pass such an information from a site's proxy to a proxy-less client). Well, that was a long mail, but I'd really like that you take some time to think about this approach. It remains compatible with your design without some of its drawbacks and with faster adoption. Regards, Willy
Received on Monday, 14 March 2011 19:28:48 UTC