- From: Adrien de Croy <adrien@qbik.com>
- Date: Wed, 29 Feb 2012 19:31:39 +1300
- To: Amos Jeffries <squid3@treenet.co.nz>
- CC: ietf-http-wg@w3.org
there's also the burden of TLS on top of that. With my proxy hat on, if a proxy is to add any value to such a system, it has to break into the SSL stream, decompress the messages, demultiplex the sessions to re-assemble request-response transactions then add the value, such as caching, content analysis etc. Sounds like a nightmare to me for proxy vendors. Of all that, the decompression is the simple bit. Adrien On 29/02/2012 7:18 p.m., Amos Jeffries wrote: > On 29/02/2012 3:04 p.m., patrick mcmanus wrote: >> The spdy compression scheme has proven itself very valuable in >> firefox testing. >> >> Just like Mike has seen - we see 90% header reduction sizes, >> including on cookies because they are so repetitive between requests >> even if the individual cookies don't compress well. Exclusively, >> proscribing maps for well known values doesn't scale well to the >> value side of the n/v pairs and it biases strongly in favor of the >> status quo making it too difficult to deploy things like DNT in the >> future. I'm totally open to other formats as well - but this has >> proven itself to me with experience.. and right now I trust >> experience first, out of context data second, and good ideas third. > > I keep hearing these reduction rates as related to bytes. But no > mention of request per second throughputs. Which are equally important > and directly affected by the compression. > I also have to assume since there is no mention of middleware > measurements and still a general lack of SPDY proxy implementations > that these are all single-hop measurements from browser to server? > > Challenge: Implement a SPDY proxy which simply decompresses then > recompresses. Pass each request through 1-3 layers of this proxy. > Measure the gains and CPU loadings. > > I think you will start to see the different costing model which > middleware has to operate with. Browser also face these same costs, > but middleware is often an order of magnitude or more up on the > traffic scale with less idle CPU time to do work in. Sometimes the > middleware has to handle orders of magnitude more traffic than even > the server end. > >> >> the dos implications don't bother me. http headers always have an >> implementation size limit anyhow - the fact that you can detect that >> limit has been hit with fewer bytes on the wire is kind of a mixed >> result (i.e. it isn't all bad or good). > > > Squid can provide a real-world example here for impact of compression > on HTTP. It has the behaviour of consuming 100% of a CPU before it > slows down with a noticable turning point, which is very useful for > identifying that capacity overload point. > Some months ago a gzip body compression module was published and > several networks took it up. What they mentioned was that a one-way > compression cuts the peak req/sec rate by up to 20%. That is just for > response bodies. Now increase that a few percent to include headers > and all the body-less requests SPDY is asking for compression on as > well. Then double it for the decompression work on arrival. Things are > suddenly looking like a 50% reduction in traffic speeds is possible. > True, sending less over the wire helps the other software sharing that > wire and parallel connections, particularly useful if one is paying by > the byte. But does not help the poor end users stuck with visible > reduction in download speed, or the middleware maxing out its CPUs, or > the networks having to deploy up to twice as much hardware. > And I am not considering the RAM DoS issues Willy brought up. > > This is somewhat of a worst-case scenario given the efficiencies of > the software involved. But IMHO is realistic enough that it should be > considered as a good reason to get better measurements before diving > in and mandating compression at the transport level. > > >> >> For anyone that hasn't read some of the other posts on this topic, >> the compression here is a huge win on the request side because of >> CWND interaction. spdy multiplexes requests and can therefore, >> especially when a browser needs to get all of the subresources >> identified on the page, create a little bit of a flood of >> simultaneous requests. If each request is 500 bytes (i.e. no cookies) >> and IW=4, that's just 12 requests that can really be sent >> simultaneously. 90% compression means you can send 100+, which is a >> lot more realistic for being able to capture a page full of >> resources.. but honestly that's not far past current usage patterns >> and cookies of course put pressure on here too. So it's impt to count >> bytes and maximize savings. (pipelines have this challenge too and >> its the primary reason a pipeline of double digit depth doesn't >> really add much incremental value.) >> > > So you just sent 100+ requests. Each of which needs to be individually > decompressed, parsed processed, re-multiplexed re-compressed and > either delivered or relayed onward, where the same operations happen > all over again. How long is that going to take overall? Did you > actually save any download waiting time by sending them in one bunch? > Or will the network transfer time saving be wasted by time in > compression at each hop? > > > Lets have the bandwidth savings cake, but lets not sour it by adding > time and power costs. > > AYJ > >> On 2/28/2012 5:52 PM, Mike Belshe wrote: >>> Hi, Willy - >>> >>> Thanks for the insightful comments about header compression. I'm >>> out of the country for a few days, so I am slow to reply fully. >>> >>> We did consider this, but ultimately decided it was mostly a non >>> issue, as the problem already exists. Specifically - the same >>> amplification attacks exist in the data stream with data gzip >>> encoding. You could make an argument that origin servers and proxy >>> servers are different, I suppose; but many proxy servers are doing >>> virus scanning and other content checks anyway, and already decoding >>> that stream. But if you're still not convinced, the problem also >>> exists at the SSL layer. (SSL will happily negotiate compression of >>> the entire stream - headers & all - long before it gets to the app >>> layer). So overall, I don't think this is a new attack vector for >>> HTTP. >>> >>> We did consider some header compression schemes like what you >>> proposed - they are quite viable. They definitely provide less >>> compression, and are less resilient to learning over time like zlib >>> is. They also generally fail to help with repeated cookies, which >>> is really where some of the biggest gains are. But I'm open to >>> other compression schemes. >>> >>> Note that SPDY's compressor achieves about 85-90% compression for >>> typical web pages. >>> >>> Mike >>> >>> >>> On Sat, Feb 25, 2012 at 4:13 AM, Willy Tarreau <w@1wt.eu >>> <mailto:w@1wt.eu>> wrote: >>> >>> Hi Mike, >>> >>> On Thu, Feb 23, 2012 at 01:21:38PM -0800, Mike Belshe wrote: >>> > I posted this draft for comment from all. >>> > >>> > http://www.ietf.org/id/draft-mbelshe-httpbis-spdy-00.txt >>> >>> First, thank you for posting this draft, it's full of very nice >>> ideas. >>> >>> I have a really major concern with header compression. I >>> understand that >>> the initial reason why it is needed is that HTTP header names are >>> quite >>> long, but is an issue that we should address in HTTP/2.0 instead >>> of any >>> attempt to work around it by compressing them. Some header >>> values are >>> quite long too. User-agent could be reduced to 16bit vendor-id + >>> 8-bit >>> product-id + 8-bit product version. Cookies can be quite long but >>> will >>> generally not compress much since long ones are supposed to be >>> unpredictable randoms. >>> >>> The concern I'm having with header compression is that it makes >>> DoS much >>> more trivial using low-bandwidth attacks. I've performed a test >>> using the >>> dictionary provided in the draft. Using it, I can reduce 4233425 >>> bytes of >>> crap headers to 7 bytes ("78 f9 e3 c6 a7 c2 ec"). Considered like >>> this, it >>> looks efficient. Now see it differently : by sending a frame of >>> 18 + 7 = >>> 25 bytes of data, I can make the server write 4233425 bytes to >>> memory then >>> parse those bytes as HTTP headers. This is an amplification >>> ratio of >>> 4233425/25 = 169337. In other words, it means that with only the >>> 256 kbps >>> of upstream bandwidth I have on my entry-level ADSL access, I can >>> make an >>> intermediary or server have to process 43 Gbps of HTTP headers ! >>> And even >>> if we did not use an initial dictionary, it's easy to achieve >>> compression >>> ratios above 200 with zlib, which is still very dangerous in the >>> case of >>> DDoS attacks. >>> >>> Basically, it means that any low-end internet access is enough to >>> knock >>> down even large sites, even those who make use of >>> hardware-accelerated >>> zlib decompression. This is a major issue, especially if we >>> consider the >>> rise of DDoS attacks these days. >>> >>> In my opinion, in the sake of robustness, it is much better to be >>> able >>> to parse the stream and stop as soon as something's wrong. This >>> means >>> that the headers should be sent in the most easily parsable >>> form. For >>> efficiency, we should ditch current header names and use 1-byte >>> for the >>> most common ones. Possibly we could make use of a variable length >>> encoding >>> for the size just as we used in WebSocket. This way, most headers >>> could be >>> encoded as one byte for the length, one for the header name, and >>> a few >>> bytes for the value. Some headers could be typed as enums to >>> save space. >>> >>> For instance, if we apply this principle to the example in >>> SPDY-Fan.pdf >>> (274 bytes) : >>> >>> GET /rsrc.php/v1/y0/r/Se4RZ9IhLuW.css HTTP/1.1 >>> Host: static.ak.fbcdn.net <http://static.ak.fbcdn.net> >>> User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2.8) >>> Accept: text/css,*/*;q=0.1 >>> Accept-Language: en-us,en;q=0.5 >>> Accept-Encoding: gzip,deflate >>> Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7 >>> >>> We could get down to something more or less like this with >>> minimal effort : >>> >>> 1 byte for method=GET + version=1.1 >>> 1 byte for the Host header value length >>> 19 bytes for "static.ak.fbcdn.net <http://static.ak.fbcdn.net>" >>> 33 bytes for "/rsrc.php/v1/y0/r/Se4RZ9IhLuW.css" >>> 1 byte for User-Agent header name >>> 4 bytes for Mozilla + browser type + major version number >>> 1 byte for Accept >>> 1 byte for the number of content-types bytes >>> 1 byte for text/css >>> 1 byte for */*;q= >>> 1 byte for 0.1 >>> 1 byte for Accept-Language >>> 1 byte for the number of accept-languauge bytes >>> 1 byte for en-us >>> 1 byte for en;q= >>> 1 byte for 0.5 >>> 1 byte for Accept-Encoding >>> 1 byte for the number of bytes in the values >>> 1 byte for gzip >>> 1 byte for deflate (note: both could be merged into one >>> single) >>> 1 byte for Accept-Charset >>> 1 byte for the value length >>> 1 byte ISO-8859-1 >>> 1 byte for utf-8;q= >>> 1 byte for 0.7 >>> 1 byte for *;q= >>> 1 byte for 0.7 >>> >>> We're at a total of 80 bytes or 29% of the original size for a >>> *small* >>> request, which was already dominated by the Host, URI and UA >>> lengths. >>> And such an encoding would be as cheap to parse as it is to >>> produce. >>> >>> We also have to take intermediaries into account. Most of them >>> touch a >>> few headers (eg: add/remove a cookie, etc). An encoding such as >>> above >>> is much cheaper to produce than it is to decompress+recompress >>> just to >>> add a cookie. >>> >>> Best regards, >>> Willy >>> >>> >> > > -- Adrien de Croy - WinGate Proxy Server - http://www.wingate.com WinGate 7 is released! - http://www.wingate.com/getlatest/
Received on Wednesday, 29 February 2012 06:32:38 UTC