Re: why not multiple, short-lived HTTP/2 connections? from 陈智昌 on 2014-07-01 (ietf-http-wg@w3.org from July to September 2014)

From: 陈智昌 <willchan@chromium.org>
Date: Mon, 30 Jun 2014 21:28:00 -0700
To: Peter Lepeska <bizzbyster@gmail.com>
Cc: Patrick McManus <mcmanus@ducksong.com>, Mike Belshe <mike@belshe.com>, HTTP Working Group <ietf-http-wg@w3.org>
Message-ID: <CAA4WUYgbN2Om+n4VCQN6j72h2PiRoE7kMAwSER5_goejsKvhiw@mail.gmail.com>
On Mon, Jun 30, 2014 at 11:45 AM, <bizzbyster@gmail.com> wrote:

> Comments inline.
>
> On Jun 30, 2014, at 2:21 PM, William Chan (陈智昌) <willchan@chromium.org>
> wrote:
>
> On Mon, Jun 30, 2014 at 11:14 AM, <bizzbyster@gmail.com> wrote:
>
>> "But simply opening up more connections is not the right solution. You
>> can easily hit the other problem of too much congestion leading to way
>> worse performance. "
>>
>> That is true. But assuming you are able to detect that a browser is
>> running over a 20 Mbps FIOS link, and that due to network scale stripping
>> he/she will never exceed a 64 KB window for a given TCP connection, are you
>> saying that as a chrome developer you would still recommend using just one
>> connection? I have trouble understanding that.
>>
>
> As a browser developer, we'll do so if we feel it's the best option. But I
> think we're still a good ways from that. Thankfully most of the internet
> does not have this wscale stripping problem, and we're discussing ways to
> encourage access network operators in the relevant regions to fix their
> deployments (talk to them, shame them, etc). On almost all OSes, we can't
> actually detect that wscale stripping happened. It's not exposed to the
> application. And by the time we detect this happened, we've already lost
> roundtrips.
>
>
> Right. I'd like to see us figure out technologies that allow the Internet
> to self-heal. Servers should learn which access networks have bad
> properties and perform actions automatically to address them. In this case
> it'd be great if the protocol allowed some way for the server to tell the
> browser -- hey, due to no fault of your own you are running with one hand
>  behind your back, use two! As you know, I'm working on ideas (
> http://caffeinatetheweb.com/baking-acceleration-into-the-web-itself/) that
> will enable such a feedback loop. Shaming network operators as a way to fix
> a problem is something Google has in its toolbox. For the rest of us, we
> need technology to solve these problems.
>
>
>> Instead of mandating one connection because there are cases where
>> multiple connections increase congestion and hurt performance, let's detect
>> those cases and choose a single connection for the reason that it gives us
>> the fastest page load time.
>>
>
> It's not mandated. It's a SHOULD, and rightly so.
>
>
>
>>
>> In other words, let's let the browser figure out the optimal number in
>> each case. If we don't, then we encourage domain sharding for sites that
>> decide they care more about their high bandwidth users.
>>
>
> And the optimal number SHOULD be 1. In almost all cases.
>
>
> Over high bandwidth links where I can keep multiple connections busy
> because I know a large number of the resources I need up front and so avoid
> slow start on idle, it is never the optimal number. That's why SHOULD seems
> wrong to me. I'd like to encourage browsers to dynamically find the optimal
> number and not mandate a number that is so often going to lead to slower
> user experience and then long term encourage domain sharing.
>

I feel like the one thing I've said repeated that you have ignored
(probably because you feel it's out of your control) is the proposal that
we *fix the transport*. You keep advocating for application level
workarounds that you know are suboptimal. You feel like we browser vendors
should be investing in a bunch of detection heuristics for these edge cases
(and fair enough, edge cases might be the *normal* case for certain users).
The problem is these detection heuristics aren't easy to implement, they're
fragile, and they take awhile to detect, and it's hard to determine how
many connections to open up without causing other problems. And using
multiple connections have their own downsides which I've explained
elsewhere. This is why we'd rather invest our energies in fixing the
transport issues instead. We believe it's the right long-term solution.


>
>
>
>>
>> Thanks,
>>
>> Peter
>>
>> On Jun 30, 2014, at 1:40 PM, William Chan (陈智昌) <willchan@chromium.org>
>> wrote:
>>
>> On Mon, Jun 30, 2014 at 9:58 AM, Patrick McManus <mcmanus@ducksong.com>
>> wrote:
>>
>>>
>>>
>>>
>>> On Mon, Jun 30, 2014 at 12:04 PM, <bizzbyster@gmail.com> wrote:
>>>
>>>> All,
>>>>
>>>> Another huge issue is that for some reason I still see many TCP
>>>> connections that do not advertise support for window scaling in the SYN
>>>> packet. I'm really not sure why this is but for instance WPT test instances
>>>> are running Windows 7 and yet they do not advertise window scaling and so
>>>> TCP connections max out at a send window of 64 KB. I've seen this in tests
>>>> run out of multiple different WPT test locations.
>>>>
>>>
>> It's true that TCP window scaling can be a problem. We definitely see
>> this issue in a number of places around the world, most prominently in APAC
>> at certain ISPs (due to network wscale stripping, UGH!). But simply opening
>> up more connections is not the right solution. You can easily hit the other
>> problem of too much congestion leading to way worse performance. I talk
>> about these multiple connection and congestion issues at
>> https://insouciant.org/tech/network-congestion-and-web-browsing/ and
>> provide several example traces of problematic congestion. Fundamentally,
>> this is a transport issue and we should be fixing the transport. Indeed,
>> we're working on this at Google, both with our Make TCP Fast team and our
>> QUIC team.
>>
>>
>>>
>>>> The impact of this is that high latency connections max out at very low
>>>> throughputs. Here's an example (with tcpdump output so you can examine the
>>>> TCP flow on the wire) where I download data from a SPDY-enabled web server
>>>> in Virginia from a WPT test instance running in Sydney:
>>>> http://www.webpagetest.org/result/140629_XG_1JC/1/details/. Average
>>>> throughput is not even 3 Mbps despite the fact that I chose a 20 Mbps FIOS
>>>> connection for my test. Note that when I disable SPDY on this web server, I
>>>> render the page almost twice as fast because I am using multiple
>>>> connections and therefore overcoming the per connection throughput
>>>> limitation: http://www.webpagetest.org/result/140629_YB_1K5/1/details/.
>>>>
>>>> I don't know the root cause (Windows 7 definitely sends windows scaling
>>>> option in SYN in other tests) and have sent a note to the
>>>> webpagetest.org admin but in general there are reasons why even
>>>> Windows 7 machines sometimes appear to not use Windows scaling, causing
>>>> single connection SPDY to perform really badly even beyond the slow start
>>>> phase.
>>>>
>>>>
>>> I think this is a WPT issue you should take up offlist because , IIRC,
>>> the issue would just be in the application. its not a OS or infrastructure
>>> thing we'll need to cope with.
>>>
>>
>> I agree it's probably specific to WPT. Here's the cloudshark trace for
>> the same WPT run (http://www.webpagetest.org/result/140630_HY_SGK/)
>> using a different Chrome instance (from Dulles, VA):
>> https://www.cloudshark.org/captures/0bd0a7aa3a49?filter=tcp.flags.syn%3D%3D1.
>> As you can see, the window scaling option is on there. And the packet trace
>> is taken at the client. So that lends credence to the hypothesis this
>> problem is local to the Sydney WPT Chrome instance in your test run.
>>
>>
>>>
>>>  IIRC when I last looked at it if you used an explicit SO_RCVBUF on your
>>> socket before opening on win 7 it would set the scaling factor to the
>>> smallest factor that was able to accommodate your desired window. (so if
>>> you set it to 64KB or less, scaling would be disabled). Of course there is
>>> no way to renegotiate scaling, so that sticks with you for the life of the
>>> connection no matter what you might set RCVBUF to along the way. I believe
>>> the correct fix is "don't do that" and any new protocol implementation
>>> should be able to take that into consideration.
>>>
>>> but maybe my info is dated.
>>>
>>> -P
>>>
>>>
>>>
>>
>>
>
>
Received on Tuesday, 1 July 2014 04:28:29 UTC