Re: Review: http://www.ietf.org/id/draft-mbelshe-httpbis-spdy-00.txt from Willy Tarreau on 2012-02-29 (ietf-http-wg@w3.org from January to March 2012)

From: Willy Tarreau <w@1wt.eu>
Date: Wed, 29 Feb 2012 19:03:29 +0100
To: Patrick McManus <pmcmanus@mozilla.com>
Cc: Amos Jeffries <squid3@treenet.co.nz>, ietf-http-wg@w3.org
Message-ID: <20120229180329.GA3575@1wt.eu>
Hi Patrick,

On Wed, Feb 29, 2012 at 12:15:32PM -0500, Patrick McManus wrote:
> I'm going to start by saying that while computational scalability of all
> parts of the ecosystem (server, intermediary, browsers, embedded clients
> of other sorts, etc..) is important and must be kept within reasonable
> limits, it is not the top priority for me in doing transport design for
> the web.
> 
> The most important thing is enabling a better user experience (and
> opening up new ones) over a network where bandwidth, cpu, memory, etc
> all keep scaling up but latency doesn't operate on the same scale. our
> current strategies butt their head into these things all the time
> whether it is just delay of a handshake or delay in the ability to
> respond/sense congestion.

I agree that focusing on limiting latency effects is quite important.

(...)
> To bring this back to compression - I just took a set of 100 compressed
> real headers, and passed them through a decompress/recompress filter
> 1000 times in 350 milliseconds on one core of a rather unimpressive i5.
> Spdy would do it faster because it tends to window things smaller than
> the default gzip. So that's a cpu overhead of .35ms per set of 100. The
> headers were reduced from 44KB (for the set of 100) to about ~4KB.
> That's probably a reduction from 31 packets to 3. IW=4 means that's a
> difference of 3 rtt's of delay to send 31 packets uncompressed vs 0
> delay to send 3 compressed. 

That's precisely what worries me a lot. You were able to compress "only"
3000 requests per second on an i5, which means only 1500 request+response
per second for a proxy or gateway. I'm processing 100 times this without
compression on the same hardware, so in the sake of scalability, we first
have to divide CPU efficiency by 100, which is not a really good starting
point in my opinion. Also, adding .35ms per request (and same for the
response) means that .7 ms of real latency will be added for every proxy
layer a request has to pass through. IPS, load balancers, caches, proxies
etc... All of them will add their own delay. At some of my customers, a
request can be forwarded through up to 12 or 13 layers last time I counted,
which means approx 10 ms of cumulative processing time. I know some sites
running at less than 100 microsecond of average response time, their
response time would be increased by an order of magnitude due to this.

Also, there is another thing to consider. The only place where compression
helps is between browsers and servers. Most of the HTTP requests are
between server-side infrastructure components since for a single browser
request, a fair number of components are sollicited. I think it would be
a major design error to force all gateways, servers, etc... to communicate
together with compressed data over low-latency links where CPU matters the
most. Think about web services too.

> rtt varies a lot, but let's call that 300 ms of latency saved at the
> cost of .35ms of cpu. Its a trade off to be sure, but imo the right one
> for the net.

It's not really .35ms of CPU, it's .35 ms of CPU *per browser*. When you
have 10k browsers connected on a site, you're at 3.5s of CPU for one single
clic for each of them. That's huge. Like Amos said it, the cumulated latency
caused by CPU overhead can quickly overwhelm the network latency. If my
gateway has to deal with 50000 requests per second, it needs 17 seconds
of CPU for each second of wall-clock. Variations of CPU load will quickly
add up CPU-based latency.

> Other schemes are plausible (e.g. per session templates that can be
> referenced per transaction) and I'm very open minded to them - but I
> wanted to be clear that I haven't seen any problems with this one
> accomplishing its objectives. I think its biggest weakness (though
> tolerable) is that it creates a state management issue which causes some
> classes of spec violations to require connection termination instead of
> being localized to the transaction.

Patrick, don't get me wrong, I'm certain that what has been achieved is
really nice, and it proves that reducing the size of a request has a major
impact on user experience. But you're mostly dealing with code running at
a rate of 1 user. Some of us are dealing with code having to support 10s
to 100s of thousands of users on similarly sized hardware. We have reasons
to be quite concerned with these CPU impacts and with the increased DoS
risks.

I'll try to find time to put a few ideas on the table to start from
something, otherwise we'll all spend our time saying that we agree but
have nothing else :-)

Regards,
Willy
Received on Wednesday, 29 February 2012 18:04:18 UTC