Re: Review: http://www.ietf.org/id/draft-mbelshe-httpbis-spdy-00.txt from Adrien de Croy on 2012-02-29 (ietf-http-wg@w3.org from January to March 2012)

From: Adrien de Croy <adrien@qbik.com>
Date: Thu, 01 Mar 2012 09:25:25 +1300
To: Willy Tarreau <w@1wt.eu>
CC: Patrick McManus <pmcmanus@mozilla.com>, Amos Jeffries <squid3@treenet.co.nz>, ietf-http-wg@w3.org
Message-ID: <4F4E89B5.1000606@qbik.com>
I think in the end, an "optimized" protocol is still not going to 
achieve the compression ratio of something like gzip, simply because it 
will have difficulty compressing the URI, and header values.

However, as a proxy vendor (and I think you'll notice most people with 
the biggest concerns fall into that camp), we have to look at impact on 
proxies.

The current spec basically bypasses proxies by using TLS.

Encryption / decryption will add probably as much CPU and latency as 
compression / decompression.

In the deployment scenarios Willy is talking about, with front-end load 
balancers and reverse proxies etc, it would affect scalability to force 
adoption of encryption and compression inside a trusted, high bandwidth 
low-latency environment.

So what would we do?  We can either

a) specify a different protocol for this environment
b) make the encryption and compression be optional and then just use the 
same protocol.

(a) makes no sense.  (b) gives rise to new class of product, such as 
front-end SSL and gzip layer.  It then becomes an administrative 
decision whether to use crypt and comp on a connection.

It also means then clients can talk uncompressed and unencrypted to 
local proxies which can then decide whether to make encrypted and 
compressed connections or not.

The problem with making SSL/TLS optional is that currently that's the 
mechanism used to negotiate use of SPDY in the first place.  Without 
that, you'd need to tell the client (e.g..in the hyperlink URIs) what 
protocol to use / is in use at that site, which would mean hyperlinks 
wouldn't be http://www.example.com any more, but something else like 
spdy://www.example.com.  What port it was on would depend on deployment, 
since you couldn't easily share http and spdy native on the same port.  
Unless of course something was built in to allow both types of server to 
discriminate... without redeployment of existing http servers.... so 
actually I think you'd end up having to use a new port number.

But I generally prefer explicit to implied information, so I don't know 
if I would have a problem with this.

Adrien

On 1/03/2012 7:03 a.m., Willy Tarreau wrote:
> Hi Patrick,
>
> On Wed, Feb 29, 2012 at 12:15:32PM -0500, Patrick McManus wrote:
>> I'm going to start by saying that while computational scalability of all
>> parts of the ecosystem (server, intermediary, browsers, embedded clients
>> of other sorts, etc..) is important and must be kept within reasonable
>> limits, it is not the top priority for me in doing transport design for
>> the web.
>>
>> The most important thing is enabling a better user experience (and
>> opening up new ones) over a network where bandwidth, cpu, memory, etc
>> all keep scaling up but latency doesn't operate on the same scale. our
>> current strategies butt their head into these things all the time
>> whether it is just delay of a handshake or delay in the ability to
>> respond/sense congestion.
> I agree that focusing on limiting latency effects is quite important.
>
> (...)
>> To bring this back to compression - I just took a set of 100 compressed
>> real headers, and passed them through a decompress/recompress filter
>> 1000 times in 350 milliseconds on one core of a rather unimpressive i5.
>> Spdy would do it faster because it tends to window things smaller than
>> the default gzip. So that's a cpu overhead of .35ms per set of 100. The
>> headers were reduced from 44KB (for the set of 100) to about ~4KB.
>> That's probably a reduction from 31 packets to 3. IW=4 means that's a
>> difference of 3 rtt's of delay to send 31 packets uncompressed vs 0
>> delay to send 3 compressed.
> That's precisely what worries me a lot. You were able to compress "only"
> 3000 requests per second on an i5, which means only 1500 request+response
> per second for a proxy or gateway. I'm processing 100 times this without
> compression on the same hardware, so in the sake of scalability, we first
> have to divide CPU efficiency by 100, which is not a really good starting
> point in my opinion. Also, adding .35ms per request (and same for the
> response) means that .7 ms of real latency will be added for every proxy
> layer a request has to pass through. IPS, load balancers, caches, proxies
> etc... All of them will add their own delay. At some of my customers, a
> request can be forwarded through up to 12 or 13 layers last time I counted,
> which means approx 10 ms of cumulative processing time. I know some sites
> running at less than 100 microsecond of average response time, their
> response time would be increased by an order of magnitude due to this.
>
> Also, there is another thing to consider. The only place where compression
> helps is between browsers and servers. Most of the HTTP requests are
> between server-side infrastructure components since for a single browser
> request, a fair number of components are sollicited. I think it would be
> a major design error to force all gateways, servers, etc... to communicate
> together with compressed data over low-latency links where CPU matters the
> most. Think about web services too.
>
>> rtt varies a lot, but let's call that 300 ms of latency saved at the
>> cost of .35ms of cpu. Its a trade off to be sure, but imo the right one
>> for the net.
> It's not really .35ms of CPU, it's .35 ms of CPU *per browser*. When you
> have 10k browsers connected on a site, you're at 3.5s of CPU for one single
> clic for each of them. That's huge. Like Amos said it, the cumulated latency
> caused by CPU overhead can quickly overwhelm the network latency. If my
> gateway has to deal with 50000 requests per second, it needs 17 seconds
> of CPU for each second of wall-clock. Variations of CPU load will quickly
> add up CPU-based latency.
>
>> Other schemes are plausible (e.g. per session templates that can be
>> referenced per transaction) and I'm very open minded to them - but I
>> wanted to be clear that I haven't seen any problems with this one
>> accomplishing its objectives. I think its biggest weakness (though
>> tolerable) is that it creates a state management issue which causes some
>> classes of spec violations to require connection termination instead of
>> being localized to the transaction.
> Patrick, don't get me wrong, I'm certain that what has been achieved is
> really nice, and it proves that reducing the size of a request has a major
> impact on user experience. But you're mostly dealing with code running at
> a rate of 1 user. Some of us are dealing with code having to support 10s
> to 100s of thousands of users on similarly sized hardware. We have reasons
> to be quite concerned with these CPU impacts and with the increased DoS
> risks.
>
> I'll try to find time to put a few ideas on the table to start from
> something, otherwise we'll all spend our time saying that we agree but
> have nothing else :-)
>
> Regards,
> Willy
>
>

-- 
Adrien de Croy - WinGate Proxy Server - http://www.wingate.com
Received on Wednesday, 29 February 2012 20:25:51 UTC