About draft-nottingham-http-pipeline-01.txt

Hello Mark,

First, thanks for this update on your proposal. After having spent
some time thinking on the issue, and a bit of experience at a mobile
phone operator where transparent proxies are everywhere on the client
side, I really think that the current proposal will not make the
situation significantly improve, despite the initially good idea.

Please let me explain.

Pipelining issues are connection-specific : a client may decide to
pipeline or not to pipeline over a given connection. The client
does not know whether there are transparent proxies in the chain
(or "interception proxies", let's call them all "transparent" for
the sake of easier explanation).

When those transparent proxies are specific to the site the client
is visiting, it can make sense to rely on Assoc-Req, because after
all, it's the site's admin responsibility to ensure that their
servers will correctly build the header. In fact there's another
issue on this point, let's see later.

But when the transparent proxies are on the client side, the header
basically brings no information. In fact, either all sites work, or
all of them fail. If the transparent proxy is unable to cope with
pipelining, granted some sites will be blacklisted from time to time,
but it will still not be possible to blacklist all connections passing
through this reverse-proxy.

However, when the transparent proxy supports pipelining, it will not
be used for the vast majority of sites which do not make use of the
header, while it's unrelated to them. And for the few ones that will
emit the header reporting an issue between the transparent proxy and
the site itself, the client will refrain from pipelining while it's
unrelated to its side.

I see an easy solution to this : transparent proxies on the client
side will have to be modified to 1) remove any Assoc-Req header from
responses, and 2) forge it themselves to send a valid-looking response
to the client. However, this is contrary to what is specified in the
draft. Given the small time it takes to upgrade client-side proxies
at places such as ISPs, and given the noticeable benefits for every
internet site, I'm quite sure that every operator will do it whatever
is written in the spec.

Now for the server side, we're suggesting adding the header on the
servers themselves in order to validate the whole chain. I see two
difficulties with that :
  - large sites take more time to be modified, even in order to
    add just a header ;

  - it's more and more common on the server side to "route" requests
    via various layers of application proxies and servers, and URLs
    are rewritten, mapped, prefixed, or have their prefixes stripped.
    It's enough to see the damage done on the Location response header
    to understand what type of molestation they support on the request
    path. In these environments, the cost of adding an Assoc-Req header
    will be high compared to the perceived benefits (application authors
    test them on the local network anyway).

Thus it will be difficult (read expensive) to reliably add this header
on the server side for a little perceived benefit. Also, it will be
enough to get it wrong only once for the site to be blacklisted by
many clients anyway, rendering the efforts useless. Also, there will
be issues with duplicate headers, URLs containing commas being handled
differently by some clients when presented in the Assoc-Req header,
etc...

Another point I'm seeing is on the efficiency and deployment speed.
I don't know how many sites there are on the net, but getting all of
the valid ones emit the header will take ages. We can relate that to
the number of sites which support keep-alive and HTTP compression.
The main reason is that there is little incentive on the server side
to work on this, because the benefits are not directly perceived.

On the other hand, there are much less network operators than there
are web sites. When one operator gets the header right in his proxies,
those are millions of users which instantly get pipelining on the whole
internet.

A few months ago I would have said that pipelining was mostly useful
between the client side and the server side. Having seen the huge RTTs
that can be experienced by a mobile phone, I now know that the most
important location is there, between the mobile phone and the operator's.
And operators have a real incentive to reduce their customers' page load
time, it's a strong competitive argument. So whatever they can do, they
will do. And specifically, they're systematically setting up transparent
proxies in order to cut their customers' RTT and optimize their TCP
behaviour.

>From what I observe around me, internet access grows faster via mobile
phones than via DSL. I may be wrong but I'm pretty sure that some
statistics would back me.

That means that we can address most of the pipelining deployment issues
by targetting the client side and providing a real perceived benefit to
those who will deploy the feature, and it should concern more and more
internet users in very little time, because there are people willing to
push that mechanism forwards.

On the architecture point of view, I'd say that if we want clients to
make efficient use of pipelining, we should only bother them with the
connections they're manipulating, it should not be end-to-end, because
they don't care what's on the other side of the proxies and they can't
do anything about that.

At minima, the header should be announced in the Connection header and
be emitted by any intermediary. That could ensure that the intermediary
closest to the client has the final word and that the client reliably
knows what it can do. It would also help a lot with the URL rewriting
issues, because most components involved in rewriting URLs are reverse
proxies. They would delete the header on the server side and rewrite it
on the client side.

It still complies with your goal of spotting the baddest intermediaries,
because the ones not processing the Connection header will simply pass
the responses as they get them, and if they're the last element in the
response path before the client and they fail their pipelining, they
will be detected.

I really think that encouraging proxies (explicit and transparent) to add
the header themselves would really speed up adoption and have the expected
benefit you're looking for.

Also, one point I'm thinking about. I noticed a situation where pipelining
did not bring any advantage over the fact that some sites use a large number
of domain names for their hosts. The reason is that the first request to a
server is not pipelined, so if a client has to fetch 100 objects over 50
connections, it does only 2 non-pipelined requests over each (and I'm not
making up numbers, I've seen that).

Ideally we should find a solution so that a proxy (explicit or transparent)
can indicate to a client that it supports pipelining for whatever site the
client wants to access. That way the client will be able to make effective
use of its connections to pipeline all requests, starting from the first
ones. I think that doing so with an explicit proxy configuration is easy,
we could say that if a client is configured to use a proxy and it gets an
Assoc-Req response header, then it knows that all connections to the same
proxy can be pipelined. (For the transparent case, it's tougher because I
see no way to tell the client without risking to pass such an information
from a site's proxy to a proxy-less client).

Well, that was a long mail, but I'd really like that you take some time to
think about this approach. It remains compatible with your design without
some of its drawbacks and with faster adoption.

Regards,
Willy

Received on Monday, 14 March 2011 19:28:48 UTC