Re: Review: http://www.ietf.org/id/draft-mbelshe-httpbis-spdy-00.txt from Willy Tarreau on 2012-02-25 (ietf-http-wg@w3.org from January to March 2012)

From: Willy Tarreau <w@1wt.eu>
Date: Sat, 25 Feb 2012 13:13:36 +0100
To: Mike Belshe <mike@belshe.com>
Cc: httpbis mailing list <ietf-http-wg@w3.org>
Message-ID: <20120225121336.GC8633@1wt.eu>
Hi Mike,

On Thu, Feb 23, 2012 at 01:21:38PM -0800, Mike Belshe wrote:
> I posted this draft for comment from all.
> 
>    http://www.ietf.org/id/draft-mbelshe-httpbis-spdy-00.txt

First, thank you for posting this draft, it's full of very nice ideas.

I have a really major concern with header compression. I understand that
the initial reason why it is needed is that HTTP header names are quite
long, but is an issue that we should address in HTTP/2.0 instead of any
attempt to work around it by compressing them. Some header values are
quite long too. User-agent could be reduced to 16bit vendor-id + 8-bit
product-id + 8-bit product version. Cookies can be quite long but will
generally not compress much since long ones are supposed to be
unpredictable randoms.

The concern I'm having with header compression is that it makes DoS much
more trivial using low-bandwidth attacks. I've performed a test using the
dictionary provided in the draft. Using it, I can reduce 4233425 bytes of
crap headers to 7 bytes ("78 f9 e3 c6 a7 c2 ec"). Considered like this, it
looks efficient. Now see it differently : by sending a frame of 18 + 7 =
25 bytes of data, I can make the server write 4233425 bytes to memory then
parse those bytes as HTTP headers.  This is an amplification ratio of
4233425/25 = 169337. In other words, it means that with only the 256 kbps
of upstream bandwidth I have on my entry-level ADSL access, I can make an
intermediary or server have to process 43 Gbps of HTTP headers ! And even
if we did not use an initial dictionary, it's easy to achieve compression
ratios above 200 with zlib, which is still very dangerous in the case of
DDoS attacks.

Basically, it means that any low-end internet access is enough to knock
down even large sites, even those who make use of hardware-accelerated
zlib decompression. This is a major issue, especially if we consider the
rise of DDoS attacks these days.

In my opinion, in the sake of robustness, it is much better to be able
to parse the stream and stop as soon as something's wrong. This means
that the headers should be sent in the most easily parsable form. For
efficiency, we should ditch current header names and use 1-byte for the
most common ones. Possibly we could make use of a variable length encoding
for the size just as we used in WebSocket. This way, most headers could be
encoded as one byte for the length, one for the header name, and a few
bytes for the value.  Some headers could be typed as enums to save space.

For instance, if we apply this principle to the example in SPDY-Fan.pdf
(274 bytes) :

  GET /rsrc.php/v1/y0/r/Se4RZ9IhLuW.css HTTP/1.1
  Host: static.ak.fbcdn.net
  User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2.8)
  Accept: text/css,*/*;q=0.1
  Accept-Language: en-us,en;q=0.5
  Accept-Encoding: gzip,deflate
  Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7

We could get down to something more or less like this with minimal effort :

   1 byte for method=GET + version=1.1
   1 byte for the Host header value length
  19 bytes for "static.ak.fbcdn.net"
  33 bytes for "/rsrc.php/v1/y0/r/Se4RZ9IhLuW.css"
   1 byte for User-Agent header name
   4 bytes for Mozilla + browser type + major version number
   1 byte for Accept
   1 byte for the number of content-types bytes
   1 byte for text/css
   1 byte for */*;q=
   1 byte for 0.1
   1 byte for Accept-Language
   1 byte for the number of accept-languauge bytes
   1 byte for en-us
   1 byte for en;q=
   1 byte for 0.5
   1 byte for Accept-Encoding
   1 byte for the number of bytes in the values
   1 byte for gzip
   1 byte for deflate        (note: both could be merged into one single)
   1 byte for Accept-Charset
   1 byte for the value length
   1 byte ISO-8859-1
   1 byte for utf-8;q=
   1 byte for 0.7
   1 byte for *;q=
   1 byte for 0.7

We're at a total of 80 bytes or 29% of the original size for a *small*
request, which was already dominated by the Host, URI and UA lengths.
And such an encoding would be as cheap to parse as it is to produce.

We also have to take intermediaries into account. Most of them touch a
few headers (eg: add/remove a cookie, etc). An encoding such as above
is much cheaper to produce than it is to decompress+recompress just to
add a cookie.

Best regards,
Willy
Received on Saturday, 25 February 2012 12:14:03 UTC