Re: Review: http://www.ietf.org/id/draft-mbelshe-httpbis-spdy-00.txt from Mike Belshe on 2012-02-29 (ietf-http-wg@w3.org from January to March 2012)

From: Mike Belshe <mike@belshe.com>
Date: Wed, 29 Feb 2012 15:14:24 -0800
To: saravanakumar Annamalaisami <saravanakumar.a@gmail.com>
Cc: Willy Tarreau <w@1wt.eu>, Patrick McManus <pmcmanus@mozilla.com>, Amos Jeffries <squid3@treenet.co.nz>, ietf-http-wg@w3.org
Message-ID: <CABaLYCuh7RBDVNC8TK7EjWa+jypG8L7zjF4KWTmO0DzFVz96fg@mail.gmail.com>
On Wed, Feb 29, 2012 at 11:09 AM, saravanakumar Annamalaisami <
saravanakumar.a@gmail.com> wrote:

> Hello Willy,
>
> >>
> Also, adding .35ms per request (and same for the
> response) means that .7 ms of real latency will be added for every proxy
> layer a request has to pass through. IPS, load balancers, caches, proxies
> etc...
> >>
>
> This is a very valid/critical point. It would even get worse,  if any of
> these devices have limited number of threads/core serving the request.  The
> per-header latency added by the 'header compression' would be a big thing
> for a small request/response header cases.
>
> It is not only CPU-intensive, but also requires significant amount of
> memory, if it mandates a 'stateful glib compression'.  This could create
> lots of scalablity issue for the intermediate devices that you have listed
> and also the server.
>
> If we are really looking at reducing the header bytes, it is
> cheaper/scalable if we could address them at the HTTP/application protocol
> level itself than addressing them through 'generic stateful glib
> compression'.
>
> Like for ex., in case of request, referer/user-agent/cookie/accept are few
> of the headers that adds up to most of the bytes.  There could be other
> headers depend on the application.  And, most of the cases, these headers
> does not change during a user session.
>
> We could define a protocol/mechanism to cache/store this information in
> the server side so that the client could sent a short-form of these header
> values, after the first request.
>
> If we have a alternative way to get 'part of the benefit of the stateful
> glib HTTP header compression', it is worth it.
>


If we're going to talk about server scalability, we should talk about the
entire protocol, not just small parts of it.  The bigger win in server
scalability is likely the reduction in connections-per-site with SPDY.  A
typical web page can drop the number of connections it uses by a factor of
at least 6, sometimes more.  SSL is another factor, of course.  And both of
these are far more invasive than the compressor.

I'd really rather hear from servers implementing SPDY that can demonstrate
a real scalability problem rather than just theory.

Mike





>
> -thanks, Saravana.
>
> On Wed, Feb 29, 2012 at 11:33 PM, Willy Tarreau <w@1wt.eu> wrote:
>
>> Hi Patrick,
>>
>> On Wed, Feb 29, 2012 at 12:15:32PM -0500, Patrick McManus wrote:
>> > I'm going to start by saying that while computational scalability of all
>> > parts of the ecosystem (server, intermediary, browsers, embedded clients
>> > of other sorts, etc..) is important and must be kept within reasonable
>> > limits, it is not the top priority for me in doing transport design for
>> > the web.
>> >
>> > The most important thing is enabling a better user experience (and
>> > opening up new ones) over a network where bandwidth, cpu, memory, etc
>> > all keep scaling up but latency doesn't operate on the same scale. our
>> > current strategies butt their head into these things all the time
>> > whether it is just delay of a handshake or delay in the ability to
>> > respond/sense congestion.
>>
>> I agree that focusing on limiting latency effects is quite important.
>>
>> (...)
>> > To bring this back to compression - I just took a set of 100 compressed
>> > real headers, and passed them through a decompress/recompress filter
>> > 1000 times in 350 milliseconds on one core of a rather unimpressive i5.
>> > Spdy would do it faster because it tends to window things smaller than
>> > the default gzip. So that's a cpu overhead of .35ms per set of 100. The
>> > headers were reduced from 44KB (for the set of 100) to about ~4KB.
>> > That's probably a reduction from 31 packets to 3. IW=4 means that's a
>> > difference of 3 rtt's of delay to send 31 packets uncompressed vs 0
>> > delay to send 3 compressed.
>>
>> That's precisely what worries me a lot. You were able to compress "only"
>> 3000 requests per second on an i5, which means only 1500 request+response
>> per second for a proxy or gateway. I'm processing 100 times this without
>> compression on the same hardware, so in the sake of scalability, we first
>> have to divide CPU efficiency by 100, which is not a really good starting
>> point in my opinion. Also, adding .35ms per request (and same for the
>> response) means that .7 ms of real latency will be added for every proxy
>> layer a request has to pass through. IPS, load balancers, caches, proxies
>> etc... All of them will add their own delay. At some of my customers, a
>> request can be forwarded through up to 12 or 13 layers last time I
>> counted,
>> which means approx 10 ms of cumulative processing time. I know some sites
>> running at less than 100 microsecond of average response time, their
>> response time would be increased by an order of magnitude due to this.
>>
>> Also, there is another thing to consider. The only place where compression
>> helps is between browsers and servers. Most of the HTTP requests are
>> between server-side infrastructure components since for a single browser
>> request, a fair number of components are sollicited. I think it would be
>> a major design error to force all gateways, servers, etc... to communicate
>> together with compressed data over low-latency links where CPU matters the
>> most. Think about web services too.
>>
>> > rtt varies a lot, but let's call that 300 ms of latency saved at the
>> > cost of .35ms of cpu. Its a trade off to be sure, but imo the right one
>> > for the net.
>>
>> It's not really .35ms of CPU, it's .35 ms of CPU *per browser*. When you
>> have 10k browsers connected on a site, you're at 3.5s of CPU for one
>> single
>> clic for each of them. That's huge. Like Amos said it, the cumulated
>> latency
>> caused by CPU overhead can quickly overwhelm the network latency. If my
>> gateway has to deal with 50000 requests per second, it needs 17 seconds
>> of CPU for each second of wall-clock. Variations of CPU load will quickly
>> add up CPU-based latency.
>>
>> > Other schemes are plausible (e.g. per session templates that can be
>> > referenced per transaction) and I'm very open minded to them - but I
>> > wanted to be clear that I haven't seen any problems with this one
>> > accomplishing its objectives. I think its biggest weakness (though
>> > tolerable) is that it creates a state management issue which causes some
>> > classes of spec violations to require connection termination instead of
>> > being localized to the transaction.
>>
>> Patrick, don't get me wrong, I'm certain that what has been achieved is
>> really nice, and it proves that reducing the size of a request has a major
>> impact on user experience. But you're mostly dealing with code running at
>> a rate of 1 user. Some of us are dealing with code having to support 10s
>> to 100s of thousands of users on similarly sized hardware. We have reasons
>> to be quite concerned with these CPU impacts and with the increased DoS
>> risks.
>>
>> I'll try to find time to put a few ideas on the table to start from
>> something, otherwise we'll all spend our time saying that we agree but
>> have nothing else :-)
>>
>> Regards,
>> Willy
>>
>>
>>
>
Received on Wednesday, 29 February 2012 23:14:53 UTC