Re: Review: http://www.ietf.org/id/draft-mbelshe-httpbis-spdy-00.txt from Mike Belshe on 2012-02-28 (ietf-http-wg@w3.org from January to March 2012)

From: Mike Belshe <mike@belshe.com>
Date: Tue, 28 Feb 2012 14:52:53 -0800
To: Willy Tarreau <w@1wt.eu>
Cc: httpbis mailing list <ietf-http-wg@w3.org>
Message-ID: <CABaLYCs_RKc-MUD4Phy5jvv=z8GE6OOHXNb5OVw_7u5qPc8zOA@mail.gmail.com>
Hi, Willy -

Thanks for the insightful comments about header compression.  I'm out of
the country for a few days, so I am slow to reply fully.

We did consider this, but ultimately decided it was mostly a non issue, as
the problem already exists.   Specifically - the same amplification attacks
exist in the data stream with data gzip encoding.  You could make an
argument that origin servers and proxy servers are different, I suppose;
but many proxy servers are doing virus scanning and other content checks
anyway, and already decoding that stream.  But if you're still not
convinced, the problem also exists at the SSL layer.  (SSL will happily
negotiate compression of the entire stream - headers & all - long before it
gets to the app layer).  So overall, I don't think this is a new attack
vector for HTTP.

We did consider some header compression schemes like what you proposed -
they are quite viable.  They definitely provide less compression, and are
less resilient to learning over time like zlib is.  They also generally
fail to help with repeated cookies, which is really where some of the
biggest gains are.  But I'm open to other compression schemes.

Note that SPDY's compressor achieves about 85-90% compression for typical
web pages.

Mike


On Sat, Feb 25, 2012 at 4:13 AM, Willy Tarreau <w@1wt.eu> wrote:

> Hi Mike,
>
> On Thu, Feb 23, 2012 at 01:21:38PM -0800, Mike Belshe wrote:
> > I posted this draft for comment from all.
> >
> >    http://www.ietf.org/id/draft-mbelshe-httpbis-spdy-00.txt
>
> First, thank you for posting this draft, it's full of very nice ideas.
>
> I have a really major concern with header compression. I understand that
> the initial reason why it is needed is that HTTP header names are quite
> long, but is an issue that we should address in HTTP/2.0 instead of any
> attempt to work around it by compressing them. Some header values are
> quite long too. User-agent could be reduced to 16bit vendor-id + 8-bit
> product-id + 8-bit product version. Cookies can be quite long but will
> generally not compress much since long ones are supposed to be
> unpredictable randoms.
>
> The concern I'm having with header compression is that it makes DoS much
> more trivial using low-bandwidth attacks. I've performed a test using the
> dictionary provided in the draft. Using it, I can reduce 4233425 bytes of
> crap headers to 7 bytes ("78 f9 e3 c6 a7 c2 ec"). Considered like this, it
> looks efficient. Now see it differently : by sending a frame of 18 + 7 =
> 25 bytes of data, I can make the server write 4233425 bytes to memory then
> parse those bytes as HTTP headers.  This is an amplification ratio of
> 4233425/25 = 169337. In other words, it means that with only the 256 kbps
> of upstream bandwidth I have on my entry-level ADSL access, I can make an
> intermediary or server have to process 43 Gbps of HTTP headers ! And even
> if we did not use an initial dictionary, it's easy to achieve compression
> ratios above 200 with zlib, which is still very dangerous in the case of
> DDoS attacks.
>
> Basically, it means that any low-end internet access is enough to knock
> down even large sites, even those who make use of hardware-accelerated
> zlib decompression. This is a major issue, especially if we consider the
> rise of DDoS attacks these days.
>
> In my opinion, in the sake of robustness, it is much better to be able
> to parse the stream and stop as soon as something's wrong. This means
> that the headers should be sent in the most easily parsable form. For
> efficiency, we should ditch current header names and use 1-byte for the
> most common ones. Possibly we could make use of a variable length encoding
> for the size just as we used in WebSocket. This way, most headers could be
> encoded as one byte for the length, one for the header name, and a few
> bytes for the value.  Some headers could be typed as enums to save space.
>
> For instance, if we apply this principle to the example in SPDY-Fan.pdf
> (274 bytes) :
>
>  GET /rsrc.php/v1/y0/r/Se4RZ9IhLuW.css HTTP/1.1
>  Host: static.ak.fbcdn.net
>  User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2.8)
>  Accept: text/css,*/*;q=0.1
>  Accept-Language: en-us,en;q=0.5
>  Accept-Encoding: gzip,deflate
>  Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
>
> We could get down to something more or less like this with minimal effort :
>
>   1 byte for method=GET + version=1.1
>   1 byte for the Host header value length
>  19 bytes for "static.ak.fbcdn.net"
>  33 bytes for "/rsrc.php/v1/y0/r/Se4RZ9IhLuW.css"
>   1 byte for User-Agent header name
>   4 bytes for Mozilla + browser type + major version number
>   1 byte for Accept
>   1 byte for the number of content-types bytes
>   1 byte for text/css
>   1 byte for */*;q=
>   1 byte for 0.1
>   1 byte for Accept-Language
>   1 byte for the number of accept-languauge bytes
>   1 byte for en-us
>   1 byte for en;q=
>   1 byte for 0.5
>   1 byte for Accept-Encoding
>   1 byte for the number of bytes in the values
>   1 byte for gzip
>   1 byte for deflate        (note: both could be merged into one single)
>   1 byte for Accept-Charset
>   1 byte for the value length
>   1 byte ISO-8859-1
>   1 byte for utf-8;q=
>   1 byte for 0.7
>   1 byte for *;q=
>   1 byte for 0.7
>
> We're at a total of 80 bytes or 29% of the original size for a *small*
> request, which was already dominated by the Host, URI and UA lengths.
> And such an encoding would be as cheap to parse as it is to produce.
>
> We also have to take intermediaries into account. Most of them touch a
> few headers (eg: add/remove a cookie, etc). An encoding such as above
> is much cheaper to produce than it is to decompress+recompress just to
> add a cookie.
>
> Best regards,
> Willy
>
>
Received on Tuesday, 28 February 2012 22:53:22 UTC