- From: 陈智昌 <willchan@chromium.org>
- Date: Mon, 30 Jun 2014 21:53:52 -0700
- To: Peter Lepeska <bizzbyster@gmail.com>
- Cc: Mike Belshe <mike@belshe.com>, HTTP Working Group <ietf-http-wg@w3.org>
- Message-ID: <CAA4WUYgDabTjRm=vNnt6+SqWX2GRbngL2AMFWrf+eoAqzjG9LA@mail.gmail.com>
On Mon, Jun 30, 2014 at 11:58 AM, <bizzbyster@gmail.com> wrote: > I created this page to simulate the effect of multiple HTTP/2 connections > that are never idle. It was just synthesized to prove a point. > And my point is it doesn't mirror real webpages :) > "poor prioritization, poor header compression, bypassing slow start > (sometimes massively so), increased buffer bloat (you're just going to hurt > the responsiveness of all the other real-time applications on the shared > links), increased server costs, etc etc." > > This is a long list but if examined closely I can make a case or at least > question the importance of each one. > > Poor prioritization: the HOL blocking of a single connection means that > the good prioritization decision on the server who is doing the sending is > defeated by the poor prioritization in the network who will always have to > deliver pieces of objects in the order they are sent, regardless the > priority of the object. > Defeats is the wrong way to characterize it. HOL blocking doesn't affect the prioritization. The priority ordering remains the same. HOL blocking "just" stalls the entire connection :P And putting more things on a single connection that can get stalled increases fragility due to fate sharing, I'll totally grant you that. Putting things on separate connections on the other hand definitely *does* defeat prioritization. TCP congestion control tries to be "fair" between connections, whereas we want to prioritize instead of being fair. There's definitely a tradeoff here. But saying prioritization is "defeated" is wrong. And take a look at how huge this prioritization win can be: https://plus.google.com/+ShubhiePanicker/posts/Uw87yxQFCfY. This is something that doesn't get captured in a PLT metric, but in the application specific user experience metrics, can be enormous. > Poor header compression: the most important aspect of header compression > from a PLT perspective as I understand it is the ability to issue a large > number of GETs in the first round trip out of a new HTTP/2 connection. But > if I open a second connection, I get 2x the initcwnd in the upload > direction. So even though my overall compression may suffer, more > connections enables me to offset the compression loss by a larger effective > initcwnd. > I think this is a fair point. But the bytes saved here are important too. Client upstream links are often more congested, and sending more bytes is suboptimal from a congestion and bufferbloat perspective, not to mention mobile metered data. > Bypassing slow start: for high latency connections, slow start is too slow > for tiny web files. Why else are Google servers caching CWNDs to start at > 30+ as you note in your blog? Another way to achieve this is to open > multiple connections. > Caching CWNDs is a more dynamic approach that tries to guess at a better starting value for slow start. Having the client simply open up multiple connections in order to hog more of the available bandwidth is less friendly to the ecosystem. > Buffer bloat: HTTP/2 will cause even performance optimized browsers to use > fewer connections so this issue will be reduced with HTTP/2. I admit I > don't understand this issue well enough to comment on it further. > > Increased server costs: This is a difficult thing to argue but hardware > costs are always decreasing and improved PLT can arguably offset these > increased costs. Again, HTTP/2 multiplexing definitely should result in > FEWER connections. Just not 1 if we are trying to optimize PLT. > > > Peter > > On Jun 30, 2014, at 2:22 PM, William Chan (陈智昌) <willchan@chromium.org> > wrote: > > Peter, this test page is too synthetic. It has no script, nor stylesheets. > It doesn't have resource discovery chains (e.g. script loading other > resources). This test is purely about PLT, which is well documented now to > be a suboptimal metric ( > http://www.stevesouders.com/blog/2013/05/13/moving-beyond-window-onload/). > Prioritization is useless if you only choose resources that all receive the > same priority. > > I think you're right that if all you're measuring is PLT, and you have > significant transport bottlenecks, that SPDY & HTTP/2 can be slower. > Opening multiple connections has its own set of significant issues though: > poor prioritization, poor header compression, bypassing slow start > (sometimes massively so), increased buffer bloat (you're just going to hurt > the responsiveness of all the other real-time applications on the shared > links), increased server costs, etc etc. > > Pushing for a single connection is the _right_ thing to do in so many > ways. There might be specific exceptional cases where a single TCP > connection is still slower. We should fix TCP, and more generally the > transport, so these issues mostly go away. > > On Mon, Jun 30, 2014 at 10:55 AM, <bizzbyster@gmail.com> wrote: > >> Comments inline. >> >> On Jun 30, 2014, at 1:00 PM, Mike Belshe <mike@belshe.com> wrote: >> >> >> >> >> On Mon, Jun 30, 2014 at 9:04 AM, <bizzbyster@gmail.com> wrote: >> >>> All, >>> >>> Another huge issue is that for some reason I still see many TCP >>> connections that do not advertise support for window scaling in the SYN >>> packet. I'm really not sure why this is but for instance WPT test instances >>> are running Windows 7 and yet they do not advertise window scaling and so >>> TCP connections max out at a send window of 64 KB. I've seen this in tests >>> run out of multiple different WPT test locations. >>> >>> The impact of this is that high latency connections max out at very low >>> throughputs. Here's an example (with tcpdump output so you can examine the >>> TCP flow on the wire) where I download data from a SPDY-enabled web server >>> in Virginia from a WPT test instance running in Sydney: >>> http://www.webpagetest.org/result/140629_XG_1JC/1/details/. Average >>> throughput is not even 3 Mbps despite the fact that I chose a 20 Mbps FIOS >>> connection for my test. Note that when I disable SPDY on this web server, I >>> render the page almost twice as fast because I am using multiple >>> connections and therefore overcoming the per connection throughput >>> limitation: http://www.webpagetest.org/result/140629_YB_1K5/1/details/. >>> >>> I don't know the root cause (Windows 7 definitely sends windows scaling >>> option in SYN in other tests) and have sent a note to the >>> webpagetest.org admin but in general there are reasons why even Windows >>> 7 machines sometimes appear to not use Windows scaling, causing single >>> connection SPDY to perform really badly even beyond the slow start phase. >>> >> >> I believe your test is invalid. >> >> The time-to-first-byte (single connection) in the first case is 1136ms. >> The time-to-first-byte is 768ms. They should have been identical, right? >> What this really means is that you're testing over the public net, and >> your variance is somewhere in the 50% range. The key to successful >> benchmarking is eliminating variance and *lots* of iterations. >> >> >> I've re-run this 5 times with SPDY on and a few times with SPDY off and >> the result is always about 2x slower with SPDY. Here are the 5 with SPDY >> off: >> >> http://www.webpagetest.org/result/140629_XG_1JC/ >> http://www.webpagetest.org/result/140630_54_T4T/ >> http://www.webpagetest.org/result/140630_B0_T9N/ >> http://www.webpagetest.org/result/140630_97_TAN/ >> http://www.webpagetest.org/result/140630_XM_TCF/ >> >> The reason is because no window scaling is happening on the TCP >> connection and so the SPDY case can only run at 64KB / 230 ms or about 2.2 >> Mbps. In the non-SPDY case I download the 6 images over 6 connections, each >> operating at 2.2 Mbps. In the SPDY disabled case, max throughput is 6 >> connections X 2.2 Mbps, which is about 19 Mbps. The webpagetest throughput >> curves illustrate this difference. >> >> >> >> I'm not saying you aren't seeing a trend. You might be, but this test >> doesn't show it. But variance is exacerbating the issue, the benchmark >> transport is suspect (for reasons you've already identified), and I further >> suspect that you're using a funky server transport connection (what is your >> initial CWND? what is your initial server recv buffer?) >> >> I also looked at the TCP dumps. Here are some random notes: >> * Looks like your SSL is using 16KB record sizes. That will cause >> delays. >> * I'm not seeing any packet loss (haven't used cloudshark before, so >> maybe I'm not looking carefully enough). If there is no packet loss, and >> we're not bandwidth constrained, then this is just test-environment >> related, right? >> * Notice that the time-to-first render (which is the important metric >> for a page which is all images) makes the case you call the slow case >> faster (yes, its 1.383s instead of 1.026s, but remember it had a 368ms >> handicap (1136-768ms due to variance mentioned above) >> * there is noise in this test. the first case sent 88KB/109pkts to 74.125.237.183 >> which the second test didn't send. I believe this is the browser doing >> something behind your back but interfering with your test. Not sure. >> * what is the cwnd on your server? >> >> Overall, I can't explain why you're seeing a different result in this >> test, except to point out all the variances which make the test look >> suspect. >> >> >> Thanks for taking a look. In the TCP dumps, look at the Ack packets >> during the data transfer you'll see "no window scaling used" and an >> advertised window of 64 KB. This is the problem. >> >> >> >> >> >>> >>> SPDY/HTTP2 is supposed to be about faster. Let's encourage browser >>> developers to make use of the new protocol to make web browsing as fast as >>> possible -- and not limit them to one connection and therefore essentially >>> ask them to do so with one hand tied behind their backs. >>> >> >> Again, after all the testing I did, it was clear that single connection >> is the best route. >> >> I suspect this is because you were not able to keep multiple connections >> from being idle and they fell back to initcwnd due to slow start when idle. >> It's the only explanation I can think of. >> >> >> I didn't reply earlier about packet loss, but the packet loss simulators >> generally issue "random" packet loss. This is different from "correlated" >> packet loss like a buffer tail drop in a router. With the former, a >> single-connection protocol gets screwed, because the cwnd gets cut in half >> on a single connection, while a 2-connection load would only have its cwnd >> cut by 1/4 with a single loss. But in *correlated* loss modeling, we see >> *all* connections get packet loss at the same time, because the entire >> buffer was lost in the router. The net result is that if you simulate tail >> drops, you tend to see the cwnd collapse due to packet lose be much more >> similar in single and multi connection policies. >> >> True. Sometimes connections from all packets are dropped, in which case >> multiple connections acts like a single connection. But otherwise multiple >> connections is more robust to packet loss, as you point out. >> >> >> Mike >> >> >> >> >>> >>> Thanks, >>> >>> Peter >>> >>> >>> On Jun 25, 2014, at 5:27 PM, bizzbyster@gmail.com wrote: >>> >>> Responses inline. >>> >>> On Jun 25, 2014, at 11:47 AM, Mike Belshe <mike@belshe.com> wrote: >>> >>> >>> >>> >>> On Wed, Jun 25, 2014 at 7:56 AM, <bizzbyster@gmail.com> wrote: >>> >>>> Thanks for all the feedback. I'm going to try to reply to Mike, Greg, >>>> Willy, and Guille in one post since a few of you made the same or similar >>>> points. My apologies in advance for the very long post. >>>> >>>> First, you should understand that I am building a browser and web >>>> server that use the feedback loop described here ( >>>> http://caffeinatetheweb.com/baking-acceleration-into-the-web-itself/) >>>> to provide the browser with a set of hints inserted into the html that >>>> allow it to load the page much faster. I prefer subresource hints to server >>>> push because A) it works in coordination with the browser cache state and >>>> B) hints can be supplied for resources found on third party domains. But my >>>> hints also go beyond just supplying the browser with a list of URLs to >>>> fetch: >>>> http://lists.w3.org/Archives/Public/public-web-perf/2014Jun/0044.html. >>>> They also include an estimate of the size of the objects for instance. >>>> >>>> Okay so this is all relevant because it means that I often know the >>>> large number of objects (sometimes 50+) I need to fetch from a given server >>>> up front and therefore have to figure out the optimal way to retrieve these >>>> objects. Unlike Mike's tests, my tests have shown that a pool with multiple >>>> connections is faster than a single one, perhaps because my server hints >>>> allow me to know about a much larger number of URLs up front and because I >>>> often have expected object sizes. >>>> >>> >>> With appropriate hand-crafted and customized server hints, I'm not >>> surprised that you can outpace a single connection in some scenarios. >>> >>> But, the answer to "which is faster" will not be a boolean yes/no - you >>> have to look at all sorts of network conditions, including link speed, RTT, >>> and packet loss. >>> >>> The way I chose how to optimize was based on studying how networks are >>> evolving over time: >>> a) we know that bandwidth is going up fairly rapidly to end users >>> b) we know that RTT is not changing much, and in some links going up >>> c) packet loss is getting better, but is very difficult to pin down & >>> even harder to appropriately model (tail drops vs random, etc) >>> >>> So I chose to optimize assuming that (b) will continue to hold true. In >>> your tests, you should try jacking the RTT up to 100ms+. Average RTT to >>> Google is ~100ms (my data slightly old here). If you're on a super fast >>> link, then sure, initcwnd will be your bottleneck, because RTT is not a >>> factor. What RTTs did you simulate? I'm guessing you were using a >>> high-speed local network? >>> >>> >>> We are testing at many different latencies (including satellite) but >>> also the benefit of multiple connections actually increases as RTT >>> increases because connections are in a slow start bottlenecked state for >>> longer for higher latency links. After 4 RTTs a single connection will be >>> able to transfer roughly initcwnd*(2)^4 in the next round trip whereas 3 >>> connections will be able to transfer 3*initcwnd*(2)^4, assuming we can keep >>> all three connections full and we are not bandwidth limited. >>> >>> In addition, multiple connections are more robust to packet loss. If we >>> drop a packet after 4 RTTs, the overall throughput for the single >>> connection case drops back down to its initial rate before exponentially >>> growing again whereas only one connection in the 3 connection case will >>> fall back to initcwnd. >>> >>> >>> Overall, the single connection has some drawbacks on some networks, but >>> by and large it works better while also providing real server efficiencies >>> and finally giving the transport the opportunity to do its job better. >>> When we split onto zillions of connections, we basically sidestep all of >>> the transport layer's goodness. (This is a complex topic too, however). >>> >>> >>>> If I need to fetch 6 small objects (each the size of a single full >>>> packet) from a server that has an initcwnd of 3, >>>> >>> >>> Why use a server with cwnd of 3? Default linux distros ship with 10 >>> today (and have done so for like 2 years). >>> >>> >>> Understood. 3 is the value mentioned in Ops Guide so I just used that >>> for my example. But the argument applies to 10 as well. The main thing is >>> that if we know the size of the file we can do really clever things to make >>> connection pools download objects really fast. >>> >>> >>> >>> >>> >>>> I can request 3 objects on each of two connections and download those >>>> objects in a single round trip. This is not a theoretical idea -- I have >>>> tested this and I get the expected performance. In general, a pool of cold >>>> HTTP/2 connections is much faster than a single cold connection for >>>> fetching a large number of small objects, especially when you know the size >>>> up front. I will share the data and demo as soon as I'm able to. >>>> >>> >>> Many have tested this heavily too, so I believe your results. My own >>> test data fed into >>> https://developers.google.com/speed/articles/tcp_initcwnd_paper.pdf >>> >>> >>>> Since I know that multiple connections is faster, I can imagine a >>>> solution that web performance optimizers will resort to if browsers only >>>> support one connections per host: domain sharding! Let's avoid this by >>>> removing the SHOULD NOT from the spec. >>>> >>> >>> I think we need to get into the nitty gritty benchmarking details if we >>> want to claim that a single connection is faster. I highly doubt this is >>> true for all network types. >>> >>> >>> For non-bandwidth limited links, the improvement for multiple >>> connections should increase with increasing latency and increasing packet >>> loss. So I have trouble understanding how a single connection could ever >>> win. I wonder if in your tests you were not able to keep your multiple >>> connections full b/c that could result in slow start on idle kicking in for >>> those connections. Is that possible? >>> >>> >>> >>> >>>> >>>> "Servers must keep open idle connections, making load balancing more >>>> complex and creating DOS vulnerability." A few of you pointed out that the >>>> server can close them. That's true. I should not have said "must". But >>>> Mark's Ops Guide suggests that browsers will aggressively keep open idle >>>> connections for performance reasons, and that servers should support this >>>> by not closing these connections. And also servers should keep those >>>> connections fast by disabling slow start after idle. In my opinion, >>>> browsers should keep connections open only as long as they have the >>>> expectation of imminent requests to issue on those connections, which is >>>> essentially the way that mainstream browsers handle connection lifetimes >>>> for HTTP/1.1 connections today. We should not create an incentive for >>>> browsers to hold on to connections for longer than this and to encourage >>>> servers to support longer lived idle connections than they already do today. >>>> >>> >>> What we really need is just a better transport. We should have 'forever >>> connections'. The idea that the endpoints need to maintain state to keep >>> connections open is so silly; session resumption without a round-trip is >>> very doable. I believe QUIC doe this :-) >>> >>> Agreed -- I can't wait to see QUIC in action. >>> >>> >>> >>>> >>>> Some of you pointed out that a single connection allows us to get back >>>> to fair congestion control. But TCP slow start and congestion control are >>>> designed for transferring large objects. They unfairly penalize >>>> applications that need to fetch a large number of small objects. Are we >>>> overflowing router buffers today b/c we are using 6 connections per host? I >>>> agree that reducing that number is a good thing, which HTTP/2 will >>>> naturally enable. But I don't see any reason to throttle web browsers down >>>> to a single slow started connection. Also again, web browser and site >>>> developers will work around this artificial limit. In the future we will >>>> see 50+ Mbps last mile networks as the norm. This makes extremely fast page >>>> load times possible, if only we can mitigate the impact of latency by >>>> increasing the concurrency of object requests. I realize that QUIC may >>>> eventually solve this issue but in the meantime we need to be able to use >>>> multiple TCP connections to squeeze the most performance out of today's web. >>>> >>> >>> A tremendous amount of research has gone into this, and you're asking >>> good questions to which nobody knows the exact answers. Its not that we >>> don't know the answers for not trying - its because there are so many >>> combinations of network equipment, speeds, configs, etc, in the real world >>> that all real world data is a mix of errors. Given the research that has >>> gone into it, I wouldn't expect these answers to come crisply or quickly. >>> >>> I agree we'll have 50+Mbps in the not-distant future. But so far, there >>> is no evidence that we're figuring out how to bring RTT's down. Hence, >>> more bandwidth doesn't matter much: >>> https://docs.google.com/a/chromium.org/viewer?a=v&pid=sites&srcid=Y2hyb21pdW0ub3JnfGRldnxneDoxMzcyOWI1N2I4YzI3NzE2 >>> >>> >>> If we can increase concurrency, via dynamically generated server hints, >>> then we can start seeing increases in bandwidth show up as increases in >>> page load time again -- we can make bandwidth matter again. Server hints >>> also allows us to keep our multiple connections full so we can avoid the >>> slow start on idle issue without requiring a config change on servers. >>> >>> >>> Mike >>> >>> >>> >>> >>>> >>>> Thanks for reading through all this, >>>> >>>> Peter >>>> >>>> On Jun 24, 2014, at 3:55 PM, Mike Belshe <mike@belshe.com> wrote: >>>> >>>> >>>> >>>> >>>> On Tue, Jun 24, 2014 at 10:50 AM, <bizzbyster@gmail.com> wrote: >>>> >>>>> I've raised this issue before on the list but it's been a while and >>>>> reading Mark's ops guide doc ( >>>>> https://github.com/http2/http2-spec/wiki/Ops) I'm reminded that >>>>> requiring the use of a single connection for HTTP/2 ("Clients SHOULD >>>>> NOT open more than one HTTP/2 connection") still makes no sense to me. Due >>>>> to multiplexing, HTTP/2 will naturally use FEWER connections than HTTP/1, >>>>> which is a good thing, but requiring a single connection has the following >>>>> drawbacks: >>>>> >>>>> >>>>> 1. Servers must keep open idle connections, making load balancing >>>>> more complex and creating DOS vulnerability. >>>>> >>>>> >>>> As others have mentioned, you don't have to do this. >>>> >>>>> >>>>> 1. Servers must turn off *tcp_slow_start_after_idle* in order for >>>>> browsers to get good performance, again creating DOS vulnerability. >>>>> >>>>> You also don't have to do this; it will drop back to init cwnd levels >>>> if you do, just as though you had opened a fresh connection. >>>> >>>> >>>>> >>>>> 1. The number of simultaneous GET requests I'm able to upload in >>>>> the first round trip is limited to the compressed amount that can fit in a >>>>> single initcwnd. Yes compression helps with this but if I use multiple >>>>> connections I will get the benefit of compression for the requests on the >>>>> same connection, in addition to having multiple initcwnds! >>>>> >>>>> It turns out that a larger initcwnd just works better anyway - there >>>> was a tremendous amount of evidence supporting going up to 10, and that was >>>> accepted at in the transport level already. >>>> >>>> >>>>> >>>>> 1. The amount of data I'm able to download in the first round trip >>>>> is limited to the amount that can fit in a single initcwnd. >>>>> >>>>> It turns out the browser doesn't really know how many connections to >>>> open until that first resource is downloaded anyway. Many out-of-band >>>> tricks exist. >>>> >>>> >>>>> >>>>> 1. Head of line blocking is exacerbated by putting all objects on >>>>> a single connection. >>>>> >>>>> Yeah, this is true. But overall, its still faster and more efficient. >>>> >>>> >>>> >>>> >>>>> >>>>> Multiple short-lived HTTP/2 connections gives us all the performance >>>>> benefits of multiplexing without any of the operational or performance >>>>> drawbacks. As a proxy and a browser implementor, I plan to use >>>>> multiple HTTP/2 connections when talking to HTTP/2 servers because it seems >>>>> like the right thing to do from a performance, security, and operational >>>>> perspective. >>>>> >>>> >>>> When I tested the multi-connection scenarios they were all slower for >>>> me. In cases of severe packet loss, it was difficult to discern as >>>> expected. But overall, the reduced server resource use and the efficiency >>>> outweighed the negatives. >>>> >>>> Mike >>>> >>>> >>>> >>>>> >>>>> I know it's very late to ask this but can we remove the "SHOULD NOT" >>>>> statement from the spec? Or, maybe soften it a little for those of us who >>>>> cannot understand why it's there? >>>>> >>>>> Thanks, >>>>> >>>>> Peter >>>>> >>>> >>>> >>>> >>> >>> >>> >> >> > >
Received on Tuesday, 1 July 2014 04:54:21 UTC