Re: Experiences with HTTP/2 server push from Tom Bergan on 2016-09-01 (ietf-http-wg@w3.org from July to September 2016)

From: Tom Bergan <tombergan@chromium.org>
Date: Thu, 1 Sep 2016 09:38:14 -0700
To: Kazuho Oku <kazuhooku@gmail.com>
Cc: HTTP Working Group <ietf-http-wg@w3.org>
Message-ID: <CA+3+x5F8M+xbH2YjD9m87WPOyQPCTVNV8evBJQHn9gicch+1TQ@mail.gmail.com>
Thanks for the feedback and link to that workshop talk! A few comments
inline.

On Wed, Aug 31, 2016 at 9:57 PM, Kazuho Oku <kazuhooku@gmail.com> wrote:
>
> Consider the case where a large HTML that loads a CSS is sent over the
> wire. In a typical implementation, the server will pass a block of
> HTML much larger than INITCWND to the TCP stack before recognizing the
> request for CSS. So the client would need to wait for multiple RTTs
> before starting to receive the CSS.
>

Unrelated to your above comment -- I think servers should use a higher
initcwnd with H2, and I know that some servers do this. The experiments in
our doc used Linux's default initcwnd (10 packets). If you compare that to
H1, where browsers use 6 concurrent connections, the effective initcwnd for
H1 is 60 packets (well, not exactly, since the browser only makes one
request initially, but as soon as the browser starts making additional
requests, cwnd effectively grows much faster than it would with a single
connection).


> That said, as discussed at the workshop, it is possible to implement a
> HTTP/2 server that does not get affected by HoB between the different
> streams (see https://github.com/HTTPWorkshop/workshop2016/
> blob/master/talks/tcpprog.pdf).
>
> I would suggest that regardless of whether or not push is used, server
> implementors should consider adopting such approach to minimize the
> impact of HoB.
>

This is really interesting. To summarize: the idea is to use getsockopt to
compute the number of available bytes in cwnd so that sizeof(kernel buffer)
= cwnd. I rejected this idea without thinking about it much because it
seemed like it would increase kernel/user round-trips and perform poorly in
bursty conditions. But, your idea to restrict this optimization to cases
where it matters most makes sense. Do you have performance measurements of
this idea under heavy load? Are you using TCP_NOTSENT_LOWAT for cases where
the optimization cannot be used?


> It should also be noted that with QUIC such HoB would not be an issue
> since there would no longer be a send buffer within the kernel.
>

Yep, this is definitely an advantage of QUIC.

"Rule 2: Push Resources in the Right Order"
>
> My take is that the issue can / should be solved by clients sending
> PRIORITY frames for pushed resources when they observe how the
> resources are used, and that until then servers should schedule the
> pushed streams separately from the client-driven prioritization tree
> (built by using the PRIORITY frames).
>
> Please refer to the discussion in the other thread for details:
> https://lists.w3.org/Archives/Public/ietf-http-wg/2016JulSep/0453.html


To make sure I understand the idea: Suppose you send HTML, then push
resources X and Y. You will continue pushing X and Y until you get requests
from the client, at which point you switch to serving requests made by the
client (which may or may not include X and Y, as the client may not know
about those resources yet, depending on what you decided to push). These
client requests are served via the client-driven priority tree.

Is that right? If so, you've basically implemented rule #1 -- the push
lasts while the network is idle, then you switch to serving client requests
afterwards. It's nice to see that we came to the same high-level conclusion
:-). But, I like the way you've phrased the problem. Instead of computing a
priori out how much data you should push, which we suggested, you start
pushing an arbitrary number of things, then you'll automatically stop
pushing as soon as you get the next client request.

One more clarification: what happens when the client loads two pages
concurrently and the network is effectively never idle? I assume push won't
happen in this case?

Next, I think you're arguing that push order doesn't matter as long as you
have a solution for HoB. I don't think this is exactly right. Specifically:

- Head-of-link blocking (HoB) can happen due to network-level bufferbloat.
The above solution only applies to kernel-level bufferbloat. You need some
kind of bandwidth-based pacing to avoid network-level bufferbloat.

- If you're pushing X and Y, and you know the client will use X before Y,
you should push in that order. The opposite order is sub-optimal and can
eliminate the benefit of push in some cases, even ignoring HoB.

As a server implementor, I have always dreamt of cancelling a push
> after sending a PUSH_PROMISE. In case a resource we want to push
> exists on a dedicate cache that cannot be reached synchronously from
> the HTTP/2 server, the server needs to send PUSH_PROMISE without the
> guarantee that it would be able to push a valid response.
>
> It would be great if we could have an error code that can be sent
> using RST_STREAM to notify the client that it should discard the
> PUSH_PROMISE being sent, and issue a request by itself.
>

Yes, +1. I've wanted this feature. It sucks that the client won't reissue
the requests if they get a RST_STREAM. (At least, Chrome won't do this, I
don't know about other browsers.)
Received on Thursday, 1 September 2016 16:38:47 UTC