Re: Pipeline hinting revisited from Darin Fisher on 2011-08-12 (ietf-http-wg@w3.org from July to September 2011)

From: Darin Fisher <darin@chromium.org>
Date: Thu, 11 Aug 2011 23:19:29 -0700
To: Willy Tarreau <w@1wt.eu>
Cc: Brian Pane <brianp@brianp.net>, ietf-http-wg@w3.org
Message-ID: <CAP0-Qpt1X8ashm0hS9mH8yfhT_CSvs6SerjyW_t52cJXw81tgw@mail.gmail.com>
On Thu, Aug 11, 2011 at 10:53 PM, Willy Tarreau <w@1wt.eu> wrote:

> On Thu, Aug 11, 2011 at 10:43:36PM -0700, Darin Fisher wrote:
> > On Thu, Aug 11, 2011 at 10:31 PM, Willy Tarreau <w@1wt.eu> wrote:
> >
> > > Hi Brian,
> > >
> > > On Thu, Aug 11, 2011 at 03:12:31PM -0700, Brian Pane wrote:
> > > > I've been thinking some more about request pipelining recently,
> > > > triggered by several observations:
> > > >
> > > > - A significant number of real-world websites could be made faster
> via
> > > > widespread adoption of request pipelining (based on my study of
> > > > ~15,000 sites in the httparchive.org corpus).
> > > > - A nontrivial fraction of mobile browsers are using pipelining
> > > > already, albeit not as aggressively as they could (based on Blaze's
> > > > study: http://www.blaze.io/mobile/http-pipelining-big-in-mobile/ )
> > > > - Client implementations that currently pipeline their requests are
> > > > using heuristics of varying complexity to try to decide when
> > > > pipelining is safe.  The list of conditions documented here is at the
> > > > complex end of the spectrum, and it's perhaps still incomplete:
> > > > https://bugzilla.mozilla.org/show_bug.cgi?id=599164
> > > >
> > > > The key question, I think, is whether heuristics implemented on the
> > > > client side will end up being sufficient to detect safe opportunities
> > > > for pipelining.  If not, a server-driven hinting mechanism of the
> sort
> > > > proposed in Mark's "making pipelining usable" draft (
> > > > http://tools.ietf.org/html/draft-nottingham-http-pipeline-01 ) seems
> > > > necessary.
> > > >
> > > > Anybody have additional experimental data on pipelining (including
> the
> > > > effectiveness of heuristics for turning pipelining on or off) that
> > > > they can share?
> > >
> > > We've been conducting some tests for a customer working with mobile
> > > terminals. I was very frustrated to see that pipelining did not bring
> > > any gain there due to the first non-pipelined request to each host.
> > > What happens is that there are many objects on a page, spread over
> > > many hosts. The terminal opens many parallel connections to these
> > > hosts, and as a result, there are 4-5 objects max to fetch over each
> > > connection. All connection have a first object fetched alone, and only
> > > once a response is received, a batch of 4 requests is sent. It is this
> > > pause between the first and the next request over a connection which
> > > voids the gain. It was always faster to open more parallel connection,
> > > despite the extra bandwidth, than to use pipeline, precisely due to
> > > this point.
> > >
> > > This is why I think we need to find a solution so that pipelining could
> > > be more aggressive on riskless requests, and possibly use the server
> > > side's hinting to safely fall back to non-pipelining ASAP if needed.
> > > I'm well aware that the biggest issue seems to be with broken servers
> > > getting stuck between requests. I don't know if there are many of those
> > > or not, but maybe at one point it will become those site's problem and
> > > not the browsers'.
> > >
> >
> > Often it is a bad intermediary (transparent proxy).  The origin server
> may
> > be just as helpless as the client :-(
>
> I agree, and my point is that it's the first intermediary between the
> client and the origin server that counts for the client, and most often
> this intermediary will support pipelining and it's too bad not to use
> it with the whole world. Anyway in both cases (good or bad intermediary),
> both the server and the client will have little clue and be of little
> help. Ideally we should have request numbers for the connection, but as
> Mark pointed it out a few months ago, those might be cached and make the
> issue worse. That said, are we sure they might be cached even if advertised
> in the Connection header ? I don't really think so. If we had
> intermediaries
> or server respond with "Connection: pipeline; req=#num", or even
> "Connection: req=#num", it should make it clear where it's explicitly
> supported, without waiting for the whole world to adopt it on every
> server. The advantage with Connection is that intermediaries will get
> rid of it.
>
>
I'm sure it is very true for a lot of mobile clients that the local
intermediary would do
the right thing.

Hmm, I'd still be worried about dumb, "transparent" intermediaries that do
not consider
themselves as a hop though.  Edge servers / virtual hosts are sometimes to
blame.

We have to consider what constraints are put on intermediaries today that
require
them to behave a certain way.  If they are not required to treat the
Connection header
as hop-by-hop, then they won't.  In some sense, it doesn't matter what the
spec says.
They will do whatever doesn't punish them.

Apologies, I'm a pipelining pessimist :-/  I think perhaps it can be viable
in a mobile
(small screen) world where users already have fairly low expectations of
their browsers.
And, if we get enough mobile clients using it, then perhaps that'll be
enough to get
servers and intermediaries to play nicely.

<aside>

I'll admit that my data is dated, but when I tried to enable pipelining in
Mozilla (around 2003), I ran into such an incredibly bazaar range of
failures.  Sure, old versions of popular servers were busted.  Apache 1.3's
CGI module did not support pipelining.  Similarly, old versions of IIS were
busted.

Apache 2.0 and the latest IIS seemed to work great, except sometimes.
 Sometimes there was a mysterious failure when communicating with one of the
"good" servers.  It seemed that some intermediary must be to blame.  The
failure modes were fascinating too.  Sometimes the response would be
nothing, sometimes it would be garbage bytes, and other times the server
would simply never reply (timeout).

Around 2005, I remember people complaining that Google maps was broken in
Firefox.  Turns out these users had enabled pipelining.  Map tiles were just
mysteriously failing to load.  Apparently, it is really easy to code a
server that doesn't handle pipelining properly, which kinda makes sense.
 Google fixed it pretty quickly, but I think that is the exception.

I guess my point is that this seems like a really tough uphill battle.  It
is a chicken-n-egg problem :-/

</aside>

-Darin
Received on Friday, 12 August 2011 06:20:08 UTC