Re: Pipeline hinting revisited

On 12/08/11 17:53, Willy Tarreau wrote:
> On Thu, Aug 11, 2011 at 10:43:36PM -0700, Darin Fisher wrote:
>> On Thu, Aug 11, 2011 at 10:31 PM, Willy Tarreau<w@1wt.eu>  wrote:
>>
>>> Hi Brian,
>>>
>>> On Thu, Aug 11, 2011 at 03:12:31PM -0700, Brian Pane wrote:
>>>> I've been thinking some more about request pipelining recently,
>>>> triggered by several observations:
>>>>
>>>> - A significant number of real-world websites could be made faster via
>>>> widespread adoption of request pipelining (based on my study of
>>>> ~15,000 sites in the httparchive.org corpus).
>>>> - A nontrivial fraction of mobile browsers are using pipelining
>>>> already, albeit not as aggressively as they could (based on Blaze's
>>>> study: http://www.blaze.io/mobile/http-pipelining-big-in-mobile/ )
>>>> - Client implementations that currently pipeline their requests are
>>>> using heuristics of varying complexity to try to decide when
>>>> pipelining is safe.  The list of conditions documented here is at the
>>>> complex end of the spectrum, and it's perhaps still incomplete:
>>>> https://bugzilla.mozilla.org/show_bug.cgi?id=599164
>>>>
>>>> The key question, I think, is whether heuristics implemented on the
>>>> client side will end up being sufficient to detect safe opportunities
>>>> for pipelining.  If not, a server-driven hinting mechanism of the sort
>>>> proposed in Mark's "making pipelining usable" draft (
>>>> http://tools.ietf.org/html/draft-nottingham-http-pipeline-01 ) seems
>>>> necessary.
>>>>
>>>> Anybody have additional experimental data on pipelining (including the
>>>> effectiveness of heuristics for turning pipelining on or off) that
>>>> they can share?
>>>
>>> We've been conducting some tests for a customer working with mobile
>>> terminals. I was very frustrated to see that pipelining did not bring
>>> any gain there due to the first non-pipelined request to each host.
>>> What happens is that there are many objects on a page, spread over
>>> many hosts. The terminal opens many parallel connections to these
>>> hosts, and as a result, there are 4-5 objects max to fetch over each
>>> connection. All connection have a first object fetched alone, and only
>>> once a response is received, a batch of 4 requests is sent. It is this
>>> pause between the first and the next request over a connection which
>>> voids the gain. It was always faster to open more parallel connection,
>>> despite the extra bandwidth, than to use pipeline, precisely due to
>>> this point.
>>>
>>> This is why I think we need to find a solution so that pipelining could
>>> be more aggressive on riskless requests, and possibly use the server
>>> side's hinting to safely fall back to non-pipelining ASAP if needed.
>>> I'm well aware that the biggest issue seems to be with broken servers
>>> getting stuck between requests. I don't know if there are many of those
>>> or not, but maybe at one point it will become those site's problem and
>>> not the browsers'.
>>>
>>
>> Often it is a bad intermediary (transparent proxy).  The origin server may
>> be just as helpless as the client :-(
>
> I agree, and my point is that it's the first intermediary between the
> client and the origin server that counts for the client, and most often
> this intermediary will support pipelining and it's too bad not to use
> it with the whole world. Anyway in both cases (good or bad intermediary),
> both the server and the client will have little clue and be of little
> help. Ideally we should have request numbers for the connection, but as
> Mark pointed it out a few months ago, those might be cached and make the
> issue worse. That said, are we sure they might be cached even if advertised
> in the Connection header ? I don't really think so. If we had intermediaries
> or server respond with "Connection: pipeline; req=#num", or even
> "Connection: req=#num", it should make it clear where it's explicitly
> supported, without waiting for the whole world to adopt it on every
> server. The advantage with Connection is that intermediaries will get
> rid of it.

I was just thinking the 100-continue infrastructure built up over the 
last few years with some success could be leveraged here. "Expect: 
pipeline" and a 1xx status to indicate explicit support.

That appears to nicely solve the case where pipelining is default-off 
and enabled whenever possible.

  It will not solve the problem of agents being aggressive in the 
pipeline before seeing such a 1XX status. The 430 status proposed by 
Mark resolves that case and can be emitted under the same requirements 
as 417.

IMO we should add "Expect: pipeline", 1XX and 430. 
http://tools.ietf.org/html/draft-nottingham-http-pipeline-01 looks like 
the right vehicle to extend and see that implemented.


AYJ

Received on Friday, 12 August 2011 06:20:55 UTC