Re: Review: http://www.ietf.org/id/draft-mbelshe-httpbis-spdy-00.txt from patrick mcmanus on 2012-02-29 (ietf-http-wg@w3.org from January to March 2012)

From: patrick mcmanus <pmcmanus@mozilla.com>
Date: Tue, 28 Feb 2012 21:04:35 -0500
To: ietf-http-wg@w3.org
Message-ID: <4F4D87B3.20801@mozilla.com>
The  spdy compression scheme has proven itself very valuable in firefox 
testing.

Just like Mike has seen - we see 90% header reduction sizes, including 
on cookies because they are so repetitive between requests even if the 
individual cookies don't compress well. Exclusively, proscribing maps 
for well known values doesn't scale well to the value side of the n/v 
pairs and it biases strongly in favor of the status quo making it too 
difficult to deploy things like DNT in the future. I'm totally open to 
other formats as well - but this has proven itself to me with 
experience.. and right now I trust experience first, out of context data 
second, and good ideas third.

the dos implications don't bother me. http headers always have an 
implementation size limit anyhow - the fact that you can detect that 
limit has been hit with fewer bytes on the wire is kind of a mixed 
result (i.e. it isn't all bad or good).

For anyone that hasn't read some of the other posts on this topic, the 
compression here is a huge win on the request side because of CWND 
interaction. spdy multiplexes requests and can therefore, especially 
when a browser needs to get all of the subresources identified on the 
page, create a little bit of a flood of simultaneous requests. If each 
request is 500 bytes (i.e. no cookies) and IW=4, that's just  12 
requests that can  really be sent simultaneously. 90% compression means 
you can send 100+, which is a lot more realistic for being able to 
capture a page full of resources.. but honestly that's not far past 
current usage patterns and cookies of course put pressure on here too. 
So it's impt to count bytes and maximize savings. (pipelines have this 
challenge too and its the primary reason a pipeline of double digit 
depth doesn't really add much incremental value.)

On 2/28/2012 5:52 PM, Mike Belshe wrote:
> Hi, Willy -
>
> Thanks for the insightful comments about header compression.  I'm out 
> of the country for a few days, so I am slow to reply fully.
>
> We did consider this, but ultimately decided it was mostly a non 
> issue, as the problem already exists.   Specifically - the same 
> amplification attacks exist in the data stream with data gzip 
> encoding.  You could make an argument that origin servers and proxy 
> servers are different, I suppose; but many proxy servers are doing 
> virus scanning and other content checks anyway, and already decoding 
> that stream.  But if you're still not convinced, the problem also 
> exists at the SSL layer.  (SSL will happily negotiate compression of 
> the entire stream - headers & all - long before it gets to the app 
> layer).  So overall, I don't think this is a new attack vector for HTTP.
>
> We did consider some header compression schemes like what you proposed 
> - they are quite viable.  They definitely provide less compression, 
> and are less resilient to learning over time like zlib is.  They also 
> generally fail to help with repeated cookies, which is really where 
> some of the biggest gains are.  But I'm open to other compression 
> schemes.
>
> Note that SPDY's compressor achieves about 85-90% compression for 
> typical web pages.
>
> Mike
>
>
> On Sat, Feb 25, 2012 at 4:13 AM, Willy Tarreau <w@1wt.eu 
> <mailto:w@1wt.eu>> wrote:
>
>     Hi Mike,
>
>     On Thu, Feb 23, 2012 at 01:21:38PM -0800, Mike Belshe wrote:
>     > I posted this draft for comment from all.
>     >
>     > http://www.ietf.org/id/draft-mbelshe-httpbis-spdy-00.txt
>
>     First, thank you for posting this draft, it's full of very nice ideas.
>
>     I have a really major concern with header compression. I
>     understand that
>     the initial reason why it is needed is that HTTP header names are
>     quite
>     long, but is an issue that we should address in HTTP/2.0 instead
>     of any
>     attempt to work around it by compressing them. Some header values are
>     quite long too. User-agent could be reduced to 16bit vendor-id + 8-bit
>     product-id + 8-bit product version. Cookies can be quite long but will
>     generally not compress much since long ones are supposed to be
>     unpredictable randoms.
>
>     The concern I'm having with header compression is that it makes
>     DoS much
>     more trivial using low-bandwidth attacks. I've performed a test
>     using the
>     dictionary provided in the draft. Using it, I can reduce 4233425
>     bytes of
>     crap headers to 7 bytes ("78 f9 e3 c6 a7 c2 ec"). Considered like
>     this, it
>     looks efficient. Now see it differently : by sending a frame of 18
>     + 7 =
>     25 bytes of data, I can make the server write 4233425 bytes to
>     memory then
>     parse those bytes as HTTP headers.  This is an amplification ratio of
>     4233425/25 = 169337. In other words, it means that with only the
>     256 kbps
>     of upstream bandwidth I have on my entry-level ADSL access, I can
>     make an
>     intermediary or server have to process 43 Gbps of HTTP headers !
>     And even
>     if we did not use an initial dictionary, it's easy to achieve
>     compression
>     ratios above 200 with zlib, which is still very dangerous in the
>     case of
>     DDoS attacks.
>
>     Basically, it means that any low-end internet access is enough to
>     knock
>     down even large sites, even those who make use of hardware-accelerated
>     zlib decompression. This is a major issue, especially if we
>     consider the
>     rise of DDoS attacks these days.
>
>     In my opinion, in the sake of robustness, it is much better to be able
>     to parse the stream and stop as soon as something's wrong. This means
>     that the headers should be sent in the most easily parsable form. For
>     efficiency, we should ditch current header names and use 1-byte
>     for the
>     most common ones. Possibly we could make use of a variable length
>     encoding
>     for the size just as we used in WebSocket. This way, most headers
>     could be
>     encoded as one byte for the length, one for the header name, and a few
>     bytes for the value.  Some headers could be typed as enums to save
>     space.
>
>     For instance, if we apply this principle to the example in
>     SPDY-Fan.pdf
>     (274 bytes) :
>
>      GET /rsrc.php/v1/y0/r/Se4RZ9IhLuW.css HTTP/1.1
>      Host: static.ak.fbcdn.net <http://static.ak.fbcdn.net>
>      User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2.8)
>      Accept: text/css,*/*;q=0.1
>      Accept-Language: en-us,en;q=0.5
>      Accept-Encoding: gzip,deflate
>      Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
>
>     We could get down to something more or less like this with minimal
>     effort :
>
>       1 byte for method=GET + version=1.1
>       1 byte for the Host header value length
>      19 bytes for "static.ak.fbcdn.net <http://static.ak.fbcdn.net>"
>      33 bytes for "/rsrc.php/v1/y0/r/Se4RZ9IhLuW.css"
>       1 byte for User-Agent header name
>       4 bytes for Mozilla + browser type + major version number
>       1 byte for Accept
>       1 byte for the number of content-types bytes
>       1 byte for text/css
>       1 byte for */*;q=
>       1 byte for 0.1
>       1 byte for Accept-Language
>       1 byte for the number of accept-languauge bytes
>       1 byte for en-us
>       1 byte for en;q=
>       1 byte for 0.5
>       1 byte for Accept-Encoding
>       1 byte for the number of bytes in the values
>       1 byte for gzip
>       1 byte for deflate        (note: both could be merged into one
>     single)
>       1 byte for Accept-Charset
>       1 byte for the value length
>       1 byte ISO-8859-1
>       1 byte for utf-8;q=
>       1 byte for 0.7
>       1 byte for *;q=
>       1 byte for 0.7
>
>     We're at a total of 80 bytes or 29% of the original size for a *small*
>     request, which was already dominated by the Host, URI and UA lengths.
>     And such an encoding would be as cheap to parse as it is to produce.
>
>     We also have to take intermediaries into account. Most of them touch a
>     few headers (eg: add/remove a cookie, etc). An encoding such as above
>     is much cheaper to produce than it is to decompress+recompress just to
>     add a cookie.
>
>     Best regards,
>     Willy
>
>
Received on Wednesday, 29 February 2012 02:49:21 UTC