- From: Willy Tarreau <w@1wt.eu>
- Date: Sat, 25 Feb 2012 13:13:36 +0100
- To: Mike Belshe <mike@belshe.com>
- Cc: httpbis mailing list <ietf-http-wg@w3.org>
Hi Mike, On Thu, Feb 23, 2012 at 01:21:38PM -0800, Mike Belshe wrote: > I posted this draft for comment from all. > > http://www.ietf.org/id/draft-mbelshe-httpbis-spdy-00.txt First, thank you for posting this draft, it's full of very nice ideas. I have a really major concern with header compression. I understand that the initial reason why it is needed is that HTTP header names are quite long, but is an issue that we should address in HTTP/2.0 instead of any attempt to work around it by compressing them. Some header values are quite long too. User-agent could be reduced to 16bit vendor-id + 8-bit product-id + 8-bit product version. Cookies can be quite long but will generally not compress much since long ones are supposed to be unpredictable randoms. The concern I'm having with header compression is that it makes DoS much more trivial using low-bandwidth attacks. I've performed a test using the dictionary provided in the draft. Using it, I can reduce 4233425 bytes of crap headers to 7 bytes ("78 f9 e3 c6 a7 c2 ec"). Considered like this, it looks efficient. Now see it differently : by sending a frame of 18 + 7 = 25 bytes of data, I can make the server write 4233425 bytes to memory then parse those bytes as HTTP headers. This is an amplification ratio of 4233425/25 = 169337. In other words, it means that with only the 256 kbps of upstream bandwidth I have on my entry-level ADSL access, I can make an intermediary or server have to process 43 Gbps of HTTP headers ! And even if we did not use an initial dictionary, it's easy to achieve compression ratios above 200 with zlib, which is still very dangerous in the case of DDoS attacks. Basically, it means that any low-end internet access is enough to knock down even large sites, even those who make use of hardware-accelerated zlib decompression. This is a major issue, especially if we consider the rise of DDoS attacks these days. In my opinion, in the sake of robustness, it is much better to be able to parse the stream and stop as soon as something's wrong. This means that the headers should be sent in the most easily parsable form. For efficiency, we should ditch current header names and use 1-byte for the most common ones. Possibly we could make use of a variable length encoding for the size just as we used in WebSocket. This way, most headers could be encoded as one byte for the length, one for the header name, and a few bytes for the value. Some headers could be typed as enums to save space. For instance, if we apply this principle to the example in SPDY-Fan.pdf (274 bytes) : GET /rsrc.php/v1/y0/r/Se4RZ9IhLuW.css HTTP/1.1 Host: static.ak.fbcdn.net User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2.8) Accept: text/css,*/*;q=0.1 Accept-Language: en-us,en;q=0.5 Accept-Encoding: gzip,deflate Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7 We could get down to something more or less like this with minimal effort : 1 byte for method=GET + version=1.1 1 byte for the Host header value length 19 bytes for "static.ak.fbcdn.net" 33 bytes for "/rsrc.php/v1/y0/r/Se4RZ9IhLuW.css" 1 byte for User-Agent header name 4 bytes for Mozilla + browser type + major version number 1 byte for Accept 1 byte for the number of content-types bytes 1 byte for text/css 1 byte for */*;q= 1 byte for 0.1 1 byte for Accept-Language 1 byte for the number of accept-languauge bytes 1 byte for en-us 1 byte for en;q= 1 byte for 0.5 1 byte for Accept-Encoding 1 byte for the number of bytes in the values 1 byte for gzip 1 byte for deflate (note: both could be merged into one single) 1 byte for Accept-Charset 1 byte for the value length 1 byte ISO-8859-1 1 byte for utf-8;q= 1 byte for 0.7 1 byte for *;q= 1 byte for 0.7 We're at a total of 80 bytes or 29% of the original size for a *small* request, which was already dominated by the Host, URI and UA lengths. And such an encoding would be as cheap to parse as it is to produce. We also have to take intermediaries into account. Most of them touch a few headers (eg: add/remove a cookie, etc). An encoding such as above is much cheaper to produce than it is to decompress+recompress just to add a cookie. Best regards, Willy
Received on Saturday, 25 February 2012 12:14:03 UTC