Re: Review: http://www.ietf.org/id/draft-mbelshe-httpbis-spdy-00.txt from Mike Belshe on 2012-03-02 (ietf-http-wg@w3.org from January to March 2012)

From: Mike Belshe <mike@belshe.com>
Date: Fri, 2 Mar 2012 13:45:40 -0800
To: Willy Tarreau <w@1wt.eu>
Cc: Amos Jeffries <squid3@treenet.co.nz>, ietf-http-wg@w3.org
Message-ID: <CABaLYCtgQqJ2v8MVt-wWkTQJ2p_bEuHTxnC5Ds8XcA0MB-uDRw@mail.gmail.com>
On Fri, Mar 2, 2012 at 6:11 AM, Willy Tarreau <w@1wt.eu> wrote:

> Hi Mike,
>
> [ sorry to bother you while you're traveling ]
>
> On Fri, Mar 02, 2012 at 04:01:43AM -0800, Mike Belshe wrote:
> > zlib is quite simple.  You can scan it.  You can sanity check it.  You
> can
> > verify lz77 dictionary entries and ban any sequences that grow too long
> > (100B?).  On top of that we're compressing highly structured data, so we
> > know what the decompressed data must look like.  So if the decompressed
> > data ever falls out of whack, you can nuke it there too, as you stream
> it.
>
> I'm not contesting this point.
>
> > Short input sequences will be discardable off the bat - none of this
> "give
> > me 6 bytes to make me process 5KB" is going to be possible - you'll have
> to
> > send a sizable request to get us to process much out of it.  Then, if
> > individual dictionary entries grow unreasonably large in the first chunk
> of
> > headers (100B?), we'll drop you.  If the expanded name value pairs grow
> too
> > large, we'll drop you.
>
> At this point this means we have to set up some complexe defensive measures
> in the decoder to avoid some of the risks.
>
> > Or if it looks like you're sending junk headers or
> > unreasonable values, we'll drop you there too.
>
> This is the point where you need to decompress the stream to parse the
> headers. Even if zlib may be used as a stream, I think you'll agree that
> in the sake of efficiency, you'll try to decompress as much of it as
> possible at once, you won't call inflate() many times with a short output.
>
> > An attacker might be able
> > to convince a server to decompress 1000B on a 200B input before detecting
> > the fraud, but even that seems unlikely if we verify at all the levels I
> > mentioned already.
> >
> > For argument's sake, lets be generous and pretend its even a 10x attack
> on
> > a 100B input.  Note well that the attack does not scale - even if you can
> > turn 100 bytes into 1000 bytes processed, it does not mean that you can
> > send us 10K to get 100K processed.   Because the length caps are going to
> > be really small, as soon as we see too many bytes, we'll be dropping you,
> > decreasing your amplification.
>

Thanks for the cool response!  :-)


>
> This is the point where I disagree, because as long as the requests are
> valid you uhave no reason to drop anything. For instance, if I send the
> following 72 bytes :
>
>  78 f9 e3 c6 a7 c2 43 24  d3 aa 01 f3 02 5a 26 80
>  95 4d 17 e9 90 be 41 d9  08 96 bd f8 36 d1 c1 c2
>  51 2b 46 ad 18 b5 62 d4  8a 51 2b 46 ad 18 b5 62
>  d4 8a 51 2b 46 ad 18 b5  62 d4 8a 51 2b 46 ad 20
>  a7 ff 0a 00 4f b2 66 6f
>
> they decompress to a (hopefully) valid 4032 bytes request which barely
> looks like the SPDY encoding of :
>
>  get
> /html,image/png,image/jpg,image/gif,application/xml,application/xhtml+xml,text/plain,text/javascript,publicprivatemax-age
> HTTP/1.1
>  host:
> origin100101201202205206300302303304305306307402405406407408409410411412413414415416417502504505203
>  cookie: origin... repeated 38 times
>
> Adding the 18 bytes header to this, this means that I can produce 4032
> bytes
> of request to parse with only 90 bytes of input stream (ratio of 45x). The
> ratio even increases a lot if larger requests are accepted by the recipient
> (eg: 94x for common 8kB buffer).
>

Sure, but this would have easily tripped the fraud checks I enumerated.


>
> Since these requests are valid, the connection has no reason for being
> dropped. The attacker will simply emit a stream of many such requests
> and will receive a stream of "404 not found" for instance. At one megabit
> of upstream bandwidth, I can then make and intermediary gateway have to :
>  - decompress 1 Mbps (cheap)
>  - parse and process 94 Mbps of headers (expensive)
>  - recompress 94 Mbps of headers (expensive)
>
> And in fact it's even worse than that because only the first request takes
> spaces, the next ones benefit from the updated dictionary and are much
> smaller. I've checked I can compress 10 8kB requests into 200 bytes,
> meaning
> that each next request takes 13 bytes (+ 18 of header = 31 bytes). So after
> the first request, the http-to-wire ratio is more like 264 for 8kB requests
> (8190/31) or 122 for 4kB request (4032/33).
>

Not exactly - now you're getting into concurrent stream DoS attacks.  Here
you'll run into a new limit, which is the max concurrent streams limit.  I
don't believe it is contentious that servers will want to have a cap on the
number of streams they'll simultaneously process, and the protocol even
allows for communicating this limit to the client.


>
> At 264, you just need 4 ADSL lines to make a SPDY-to-HTTP gateway fill its
> gigabit link to the servers.
>
> Another concern with this high compression ratio coupled with pipelining is
> that it becomes easier to send a huge number of requests to a server in
> just
> a few bytes. With the example above, we're saying that both the gateway and
> the server may have to process 10000 8kB requests per second from a single
> ADSL line. As of today, doing this for an attacker still requires at least
> 260 Mbps of upstream bandwidth.
>

Concurrent stream limits.


>
> > All in all, to get our SPDY server to process 1KB of amplified data on
> your
> > 100B sent, you had to do a full 3 way handshake, plus a full SSL
> handshake
> > (with PK op on your *client*, and we got to pick which type of PK op -
> > 8192bit RSA anyone?).  Mice nuts!
>
> That's assuming that requests are identified as undesirable and also that
> the site only runs over SSL. Attackers select the easiest way to take a
> site down, not the hardest one :-)
>

> > You'd be better off attacking with
> > legitimate requests - most servers will use far more resources trying to
> > process legit requests than deflating that extraneous 1KB of data.
>
> Yes... that was precisely my point, sorry if that wasn't clear from my
> examples. An attack is always an optimization of the harm vs cost ratio.
> Sometimes it can take long to establish, but once the weak points are
> found, they're smartly exploited.
>

As Roy suggests, I think the concern is worthy, and you've made a good note
of it.  We're probably going to disagree on the magnitude of this and will
have to sort that part out later.

Mike




>
> Regards,
> Willy
>
>
Received on Friday, 2 March 2012 21:46:10 UTC