- From: Willy Tarreau <w@1wt.eu>
- Date: Wed, 29 Feb 2012 19:03:29 +0100
- To: Patrick McManus <pmcmanus@mozilla.com>
- Cc: Amos Jeffries <squid3@treenet.co.nz>, ietf-http-wg@w3.org
Hi Patrick, On Wed, Feb 29, 2012 at 12:15:32PM -0500, Patrick McManus wrote: > I'm going to start by saying that while computational scalability of all > parts of the ecosystem (server, intermediary, browsers, embedded clients > of other sorts, etc..) is important and must be kept within reasonable > limits, it is not the top priority for me in doing transport design for > the web. > > The most important thing is enabling a better user experience (and > opening up new ones) over a network where bandwidth, cpu, memory, etc > all keep scaling up but latency doesn't operate on the same scale. our > current strategies butt their head into these things all the time > whether it is just delay of a handshake or delay in the ability to > respond/sense congestion. I agree that focusing on limiting latency effects is quite important. (...) > To bring this back to compression - I just took a set of 100 compressed > real headers, and passed them through a decompress/recompress filter > 1000 times in 350 milliseconds on one core of a rather unimpressive i5. > Spdy would do it faster because it tends to window things smaller than > the default gzip. So that's a cpu overhead of .35ms per set of 100. The > headers were reduced from 44KB (for the set of 100) to about ~4KB. > That's probably a reduction from 31 packets to 3. IW=4 means that's a > difference of 3 rtt's of delay to send 31 packets uncompressed vs 0 > delay to send 3 compressed. That's precisely what worries me a lot. You were able to compress "only" 3000 requests per second on an i5, which means only 1500 request+response per second for a proxy or gateway. I'm processing 100 times this without compression on the same hardware, so in the sake of scalability, we first have to divide CPU efficiency by 100, which is not a really good starting point in my opinion. Also, adding .35ms per request (and same for the response) means that .7 ms of real latency will be added for every proxy layer a request has to pass through. IPS, load balancers, caches, proxies etc... All of them will add their own delay. At some of my customers, a request can be forwarded through up to 12 or 13 layers last time I counted, which means approx 10 ms of cumulative processing time. I know some sites running at less than 100 microsecond of average response time, their response time would be increased by an order of magnitude due to this. Also, there is another thing to consider. The only place where compression helps is between browsers and servers. Most of the HTTP requests are between server-side infrastructure components since for a single browser request, a fair number of components are sollicited. I think it would be a major design error to force all gateways, servers, etc... to communicate together with compressed data over low-latency links where CPU matters the most. Think about web services too. > rtt varies a lot, but let's call that 300 ms of latency saved at the > cost of .35ms of cpu. Its a trade off to be sure, but imo the right one > for the net. It's not really .35ms of CPU, it's .35 ms of CPU *per browser*. When you have 10k browsers connected on a site, you're at 3.5s of CPU for one single clic for each of them. That's huge. Like Amos said it, the cumulated latency caused by CPU overhead can quickly overwhelm the network latency. If my gateway has to deal with 50000 requests per second, it needs 17 seconds of CPU for each second of wall-clock. Variations of CPU load will quickly add up CPU-based latency. > Other schemes are plausible (e.g. per session templates that can be > referenced per transaction) and I'm very open minded to them - but I > wanted to be clear that I haven't seen any problems with this one > accomplishing its objectives. I think its biggest weakness (though > tolerable) is that it creates a state management issue which causes some > classes of spec violations to require connection termination instead of > being localized to the transaction. Patrick, don't get me wrong, I'm certain that what has been achieved is really nice, and it proves that reducing the size of a request has a major impact on user experience. But you're mostly dealing with code running at a rate of 1 user. Some of us are dealing with code having to support 10s to 100s of thousands of users on similarly sized hardware. We have reasons to be quite concerned with these CPU impacts and with the increased DoS risks. I'll try to find time to put a few ideas on the table to start from something, otherwise we'll all spend our time saying that we agree but have nothing else :-) Regards, Willy
Received on Wednesday, 29 February 2012 18:04:18 UTC