Re: Review: http://www.ietf.org/id/draft-mbelshe-httpbis-spdy-00.txt from Amos Jeffries on 2012-02-29 (ietf-http-wg@w3.org from January to March 2012)

From: Amos Jeffries <squid3@treenet.co.nz>
Date: Wed, 29 Feb 2012 19:18:54 +1300
To: ietf-http-wg@w3.org
Message-ID: <4F4DC34E.204@treenet.co.nz>
On 29/02/2012 3:04 p.m., patrick mcmanus wrote:
> The  spdy compression scheme has proven itself very valuable in 
> firefox testing.
>
> Just like Mike has seen - we see 90% header reduction sizes, including 
> on cookies because they are so repetitive between requests even if the 
> individual cookies don't compress well. Exclusively, proscribing maps 
> for well known values doesn't scale well to the value side of the n/v 
> pairs and it biases strongly in favor of the status quo making it too 
> difficult to deploy things like DNT in the future. I'm totally open to 
> other formats as well - but this has proven itself to me with 
> experience.. and right now I trust experience first, out of context 
> data second, and good ideas third.

I keep hearing these reduction rates as related to bytes. But no mention 
of request per second throughputs. Which are equally important and 
directly affected by the compression.
I also have to assume since there is no mention of middleware 
measurements and still a general lack of SPDY proxy implementations that 
these are all single-hop measurements from browser to server?

Challenge: Implement a SPDY proxy which simply decompresses then 
recompresses. Pass each request through 1-3 layers of this proxy. 
Measure the gains and CPU loadings.

I think you will start to see the different costing model which 
middleware has to operate with. Browser also face these same costs, but 
middleware is often an order of magnitude or more up on the traffic 
scale with less idle CPU time to do work in. Sometimes the middleware 
has to handle orders of magnitude more traffic than even the server end.

>
> the dos implications don't bother me. http headers always have an 
> implementation size limit anyhow - the fact that you can detect that 
> limit has been hit with fewer bytes on the wire is kind of a mixed 
> result (i.e. it isn't all bad or good).


Squid can provide a real-world example here for impact of compression on 
HTTP. It has the behaviour of consuming 100% of a CPU before it slows 
down with a noticable turning point, which is very useful for 
identifying that capacity overload point.
   Some months ago a gzip body compression module was published and 
several networks took it up. What they mentioned was that a one-way 
compression cuts the peak req/sec rate by up to 20%. That is just for 
response bodies. Now increase that a few percent to include headers and 
all the body-less requests SPDY is asking for compression on as well. 
Then double it for the decompression work on arrival. Things are 
suddenly looking like a 50% reduction in traffic speeds is possible. 
True, sending less over the wire helps the other software sharing that 
wire and parallel connections, particularly useful if one is paying by 
the byte. But does not help the poor end users stuck with visible 
reduction in download speed, or the middleware maxing out its CPUs, or 
the networks having to deploy up to twice as much hardware.
  And I am not considering the RAM DoS issues Willy brought up.

This is somewhat of a worst-case scenario given the efficiencies of the 
software involved. But IMHO is realistic enough that it should be 
considered as a good reason to get better measurements before diving in 
and mandating compression at the transport level.


>
> For anyone that hasn't read some of the other posts on this topic, the 
> compression here is a huge win on the request side because of CWND 
> interaction. spdy multiplexes requests and can therefore, especially 
> when a browser needs to get all of the subresources identified on the 
> page, create a little bit of a flood of simultaneous requests. If each 
> request is 500 bytes (i.e. no cookies) and IW=4, that's just  12 
> requests that can  really be sent simultaneously. 90% compression 
> means you can send 100+, which is a lot more realistic for being able 
> to capture a page full of resources.. but honestly that's not far past 
> current usage patterns and cookies of course put pressure on here too. 
> So it's impt to count bytes and maximize savings. (pipelines have this 
> challenge too and its the primary reason a pipeline of double digit 
> depth doesn't really add much incremental value.)
>

So you just sent 100+ requests. Each of which needs to be individually 
decompressed, parsed processed, re-multiplexed re-compressed and either 
delivered or relayed onward, where the same operations happen all over 
again. How long is that going to take overall? Did you actually save any 
download waiting time by sending them in one bunch? Or will the network 
transfer time saving be wasted by time in compression at each hop?


Lets have the bandwidth savings cake, but lets not sour it by adding 
time and power costs.

AYJ

> On 2/28/2012 5:52 PM, Mike Belshe wrote:
>> Hi, Willy -
>>
>> Thanks for the insightful comments about header compression.  I'm out 
>> of the country for a few days, so I am slow to reply fully.
>>
>> We did consider this, but ultimately decided it was mostly a non 
>> issue, as the problem already exists.   Specifically - the same 
>> amplification attacks exist in the data stream with data gzip 
>> encoding.  You could make an argument that origin servers and proxy 
>> servers are different, I suppose; but many proxy servers are doing 
>> virus scanning and other content checks anyway, and already decoding 
>> that stream.  But if you're still not convinced, the problem also 
>> exists at the SSL layer.  (SSL will happily negotiate compression of 
>> the entire stream - headers & all - long before it gets to the app 
>> layer).  So overall, I don't think this is a new attack vector for HTTP.
>>
>> We did consider some header compression schemes like what you 
>> proposed - they are quite viable.  They definitely provide less 
>> compression, and are less resilient to learning over time like zlib 
>> is.  They also generally fail to help with repeated cookies, which is 
>> really where some of the biggest gains are.  But I'm open to other 
>> compression schemes.
>>
>> Note that SPDY's compressor achieves about 85-90% compression for 
>> typical web pages.
>>
>> Mike
>>
>>
>> On Sat, Feb 25, 2012 at 4:13 AM, Willy Tarreau <w@1wt.eu 
>> <mailto:w@1wt.eu>> wrote:
>>
>>     Hi Mike,
>>
>>     On Thu, Feb 23, 2012 at 01:21:38PM -0800, Mike Belshe wrote:
>>     > I posted this draft for comment from all.
>>     >
>>     > http://www.ietf.org/id/draft-mbelshe-httpbis-spdy-00.txt
>>
>>     First, thank you for posting this draft, it's full of very nice
>>     ideas.
>>
>>     I have a really major concern with header compression. I
>>     understand that
>>     the initial reason why it is needed is that HTTP header names are
>>     quite
>>     long, but is an issue that we should address in HTTP/2.0 instead
>>     of any
>>     attempt to work around it by compressing them. Some header values are
>>     quite long too. User-agent could be reduced to 16bit vendor-id +
>>     8-bit
>>     product-id + 8-bit product version. Cookies can be quite long but
>>     will
>>     generally not compress much since long ones are supposed to be
>>     unpredictable randoms.
>>
>>     The concern I'm having with header compression is that it makes
>>     DoS much
>>     more trivial using low-bandwidth attacks. I've performed a test
>>     using the
>>     dictionary provided in the draft. Using it, I can reduce 4233425
>>     bytes of
>>     crap headers to 7 bytes ("78 f9 e3 c6 a7 c2 ec"). Considered like
>>     this, it
>>     looks efficient. Now see it differently : by sending a frame of
>>     18 + 7 =
>>     25 bytes of data, I can make the server write 4233425 bytes to
>>     memory then
>>     parse those bytes as HTTP headers.  This is an amplification ratio of
>>     4233425/25 = 169337. In other words, it means that with only the
>>     256 kbps
>>     of upstream bandwidth I have on my entry-level ADSL access, I can
>>     make an
>>     intermediary or server have to process 43 Gbps of HTTP headers !
>>     And even
>>     if we did not use an initial dictionary, it's easy to achieve
>>     compression
>>     ratios above 200 with zlib, which is still very dangerous in the
>>     case of
>>     DDoS attacks.
>>
>>     Basically, it means that any low-end internet access is enough to
>>     knock
>>     down even large sites, even those who make use of
>>     hardware-accelerated
>>     zlib decompression. This is a major issue, especially if we
>>     consider the
>>     rise of DDoS attacks these days.
>>
>>     In my opinion, in the sake of robustness, it is much better to be
>>     able
>>     to parse the stream and stop as soon as something's wrong. This means
>>     that the headers should be sent in the most easily parsable form. For
>>     efficiency, we should ditch current header names and use 1-byte
>>     for the
>>     most common ones. Possibly we could make use of a variable length
>>     encoding
>>     for the size just as we used in WebSocket. This way, most headers
>>     could be
>>     encoded as one byte for the length, one for the header name, and
>>     a few
>>     bytes for the value.  Some headers could be typed as enums to
>>     save space.
>>
>>     For instance, if we apply this principle to the example in
>>     SPDY-Fan.pdf
>>     (274 bytes) :
>>
>>      GET /rsrc.php/v1/y0/r/Se4RZ9IhLuW.css HTTP/1.1
>>      Host: static.ak.fbcdn.net <http://static.ak.fbcdn.net>
>>      User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2.8)
>>      Accept: text/css,*/*;q=0.1
>>      Accept-Language: en-us,en;q=0.5
>>      Accept-Encoding: gzip,deflate
>>      Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
>>
>>     We could get down to something more or less like this with
>>     minimal effort :
>>
>>       1 byte for method=GET + version=1.1
>>       1 byte for the Host header value length
>>      19 bytes for "static.ak.fbcdn.net <http://static.ak.fbcdn.net>"
>>      33 bytes for "/rsrc.php/v1/y0/r/Se4RZ9IhLuW.css"
>>       1 byte for User-Agent header name
>>       4 bytes for Mozilla + browser type + major version number
>>       1 byte for Accept
>>       1 byte for the number of content-types bytes
>>       1 byte for text/css
>>       1 byte for */*;q=
>>       1 byte for 0.1
>>       1 byte for Accept-Language
>>       1 byte for the number of accept-languauge bytes
>>       1 byte for en-us
>>       1 byte for en;q=
>>       1 byte for 0.5
>>       1 byte for Accept-Encoding
>>       1 byte for the number of bytes in the values
>>       1 byte for gzip
>>       1 byte for deflate        (note: both could be merged into one
>>     single)
>>       1 byte for Accept-Charset
>>       1 byte for the value length
>>       1 byte ISO-8859-1
>>       1 byte for utf-8;q=
>>       1 byte for 0.7
>>       1 byte for *;q=
>>       1 byte for 0.7
>>
>>     We're at a total of 80 bytes or 29% of the original size for a
>>     *small*
>>     request, which was already dominated by the Host, URI and UA lengths.
>>     And such an encoding would be as cheap to parse as it is to produce.
>>
>>     We also have to take intermediaries into account. Most of them
>>     touch a
>>     few headers (eg: add/remove a cookie, etc). An encoding such as above
>>     is much cheaper to produce than it is to decompress+recompress
>>     just to
>>     add a cookie.
>>
>>     Best regards,
>>     Willy
>>
>>
>
Received on Wednesday, 29 February 2012 06:19:40 UTC