- From: Mike Belshe <mike@belshe.com>
- Date: Fri, 2 Mar 2012 13:45:40 -0800
- To: Willy Tarreau <w@1wt.eu>
- Cc: Amos Jeffries <squid3@treenet.co.nz>, ietf-http-wg@w3.org
- Message-ID: <CABaLYCtgQqJ2v8MVt-wWkTQJ2p_bEuHTxnC5Ds8XcA0MB-uDRw@mail.gmail.com>
On Fri, Mar 2, 2012 at 6:11 AM, Willy Tarreau <w@1wt.eu> wrote: > Hi Mike, > > [ sorry to bother you while you're traveling ] > > On Fri, Mar 02, 2012 at 04:01:43AM -0800, Mike Belshe wrote: > > zlib is quite simple. You can scan it. You can sanity check it. You > can > > verify lz77 dictionary entries and ban any sequences that grow too long > > (100B?). On top of that we're compressing highly structured data, so we > > know what the decompressed data must look like. So if the decompressed > > data ever falls out of whack, you can nuke it there too, as you stream > it. > > I'm not contesting this point. > > > Short input sequences will be discardable off the bat - none of this > "give > > me 6 bytes to make me process 5KB" is going to be possible - you'll have > to > > send a sizable request to get us to process much out of it. Then, if > > individual dictionary entries grow unreasonably large in the first chunk > of > > headers (100B?), we'll drop you. If the expanded name value pairs grow > too > > large, we'll drop you. > > At this point this means we have to set up some complexe defensive measures > in the decoder to avoid some of the risks. > > > Or if it looks like you're sending junk headers or > > unreasonable values, we'll drop you there too. > > This is the point where you need to decompress the stream to parse the > headers. Even if zlib may be used as a stream, I think you'll agree that > in the sake of efficiency, you'll try to decompress as much of it as > possible at once, you won't call inflate() many times with a short output. > > > An attacker might be able > > to convince a server to decompress 1000B on a 200B input before detecting > > the fraud, but even that seems unlikely if we verify at all the levels I > > mentioned already. > > > > For argument's sake, lets be generous and pretend its even a 10x attack > on > > a 100B input. Note well that the attack does not scale - even if you can > > turn 100 bytes into 1000 bytes processed, it does not mean that you can > > send us 10K to get 100K processed. Because the length caps are going to > > be really small, as soon as we see too many bytes, we'll be dropping you, > > decreasing your amplification. > Thanks for the cool response! :-) > > This is the point where I disagree, because as long as the requests are > valid you uhave no reason to drop anything. For instance, if I send the > following 72 bytes : > > 78 f9 e3 c6 a7 c2 43 24 d3 aa 01 f3 02 5a 26 80 > 95 4d 17 e9 90 be 41 d9 08 96 bd f8 36 d1 c1 c2 > 51 2b 46 ad 18 b5 62 d4 8a 51 2b 46 ad 18 b5 62 > d4 8a 51 2b 46 ad 18 b5 62 d4 8a 51 2b 46 ad 20 > a7 ff 0a 00 4f b2 66 6f > > they decompress to a (hopefully) valid 4032 bytes request which barely > looks like the SPDY encoding of : > > get > /html,image/png,image/jpg,image/gif,application/xml,application/xhtml+xml,text/plain,text/javascript,publicprivatemax-age > HTTP/1.1 > host: > origin100101201202205206300302303304305306307402405406407408409410411412413414415416417502504505203 > cookie: origin... repeated 38 times > > Adding the 18 bytes header to this, this means that I can produce 4032 > bytes > of request to parse with only 90 bytes of input stream (ratio of 45x). The > ratio even increases a lot if larger requests are accepted by the recipient > (eg: 94x for common 8kB buffer). > Sure, but this would have easily tripped the fraud checks I enumerated. > > Since these requests are valid, the connection has no reason for being > dropped. The attacker will simply emit a stream of many such requests > and will receive a stream of "404 not found" for instance. At one megabit > of upstream bandwidth, I can then make and intermediary gateway have to : > - decompress 1 Mbps (cheap) > - parse and process 94 Mbps of headers (expensive) > - recompress 94 Mbps of headers (expensive) > > And in fact it's even worse than that because only the first request takes > spaces, the next ones benefit from the updated dictionary and are much > smaller. I've checked I can compress 10 8kB requests into 200 bytes, > meaning > that each next request takes 13 bytes (+ 18 of header = 31 bytes). So after > the first request, the http-to-wire ratio is more like 264 for 8kB requests > (8190/31) or 122 for 4kB request (4032/33). > Not exactly - now you're getting into concurrent stream DoS attacks. Here you'll run into a new limit, which is the max concurrent streams limit. I don't believe it is contentious that servers will want to have a cap on the number of streams they'll simultaneously process, and the protocol even allows for communicating this limit to the client. > > At 264, you just need 4 ADSL lines to make a SPDY-to-HTTP gateway fill its > gigabit link to the servers. > > Another concern with this high compression ratio coupled with pipelining is > that it becomes easier to send a huge number of requests to a server in > just > a few bytes. With the example above, we're saying that both the gateway and > the server may have to process 10000 8kB requests per second from a single > ADSL line. As of today, doing this for an attacker still requires at least > 260 Mbps of upstream bandwidth. > Concurrent stream limits. > > > All in all, to get our SPDY server to process 1KB of amplified data on > your > > 100B sent, you had to do a full 3 way handshake, plus a full SSL > handshake > > (with PK op on your *client*, and we got to pick which type of PK op - > > 8192bit RSA anyone?). Mice nuts! > > That's assuming that requests are identified as undesirable and also that > the site only runs over SSL. Attackers select the easiest way to take a > site down, not the hardest one :-) > > > You'd be better off attacking with > > legitimate requests - most servers will use far more resources trying to > > process legit requests than deflating that extraneous 1KB of data. > > Yes... that was precisely my point, sorry if that wasn't clear from my > examples. An attack is always an optimization of the harm vs cost ratio. > Sometimes it can take long to establish, but once the weak points are > found, they're smartly exploited. > As Roy suggests, I think the concern is worthy, and you've made a good note of it. We're probably going to disagree on the magnitude of this and will have to sort that part out later. Mike > > Regards, > Willy > >
Received on Friday, 2 March 2012 21:46:10 UTC