Re: Experiences with HTTP/2 server push from Bryan McQuade on 2016-12-04 (ietf-http-wg@w3.org from October to December 2016)

From: Bryan McQuade <bmcquade@google.com>
Date: Sun, 04 Dec 2016 12:45:10 +0000
To: Kazuho Oku <kazuhooku@gmail.com>, Tom Bergan <tombergan@chromium.org>
Cc: HTTP Working Group <ietf-http-wg@w3.org>
Message-ID: <CADLGQyAm5yTr+RFEnZ=RhwRg6S42CY=OooRTw8DA9ta_YDiihA@mail.gmail.com>
Here's a new article on H2 push from today's perf calendar which goes into
a good bit of detail:
http://calendar.perfplanet.com/2016/http2-push-the-details/

On Thu, Sep 1, 2016 at 5:39 PM Kazuho Oku <kazuhooku@gmail.com> wrote:

> Hi,
>
> Thank you for your response.
>
> 2016-09-02 1:38 GMT+09:00 Tom Bergan <tombergan@chromium.org>:
> > Thanks for the feedback and link to that workshop talk! A few comments
> > inline.
> >
> > On Wed, Aug 31, 2016 at 9:57 PM, Kazuho Oku <kazuhooku@gmail.com> wrote:
> >>
> >> Consider the case where a large HTML that loads a CSS is sent over the
> >> wire. In a typical implementation, the server will pass a block of
> >> HTML much larger than INITCWND to the TCP stack before recognizing the
> >> request for CSS. So the client would need to wait for multiple RTTs
> >> before starting to receive the CSS.
> >
> >
> > Unrelated to your above comment -- I think servers should use a higher
> > initcwnd with H2, and I know that some servers do this. The experiments
> in
> > our doc used Linux's default initcwnd (10 packets). If you compare that
> to
> > H1, where browsers use 6 concurrent connections, the effective initcwnd
> for
> > H1 is 60 packets (well, not exactly, since the browser only makes one
> > request initially, but as soon as the browser starts making additional
> > requests, cwnd effectively grows much faster than it would with a single
> > connection).
> >
> >>
> >> That said, as discussed at the workshop, it is possible to implement a
> >> HTTP/2 server that does not get affected by HoB between the different
> >> streams (see
> >>
> https://github.com/HTTPWorkshop/workshop2016/blob/master/talks/tcpprog.pdf
> ).
> >>
> >> I would suggest that regardless of whether or not push is used, server
> >> implementors should consider adopting such approach to minimize the
> >> impact of HoB.
> >
> >
> > This is really interesting. To summarize: the idea is to use getsockopt
> to
> > compute the number of available bytes in cwnd so that sizeof(kernel
> buffer)
> > = cwnd. I rejected this idea without thinking about it much because it
> > seemed like it would increase kernel/user round-trips and perform poorly
> in
> > bursty conditions. But, your idea to restrict this optimization to cases
> > where it matters most makes sense. Do you have performance measurements
> of
> > this idea under heavy load?
>
> Unfortunately not.
>
> I agree that it would be interesting to collect metrics based on real
> workload, both on the client side and the server side.
>
> OTOH let me note that since we enable the optimization only for
> connections with RTT substantially higher than the time spent by a
> single iteration of the event loop, we expect that there would be no
> performance penalty when facing a burst. The server would just switch
> to the ordinary way.
>
> > Are you using TCP_NOTSENT_LOWAT for cases where
> > the optimization cannot be used?
>
> No. I'm not sure if restricting the amount of unsent data to a fixed
> value is generally a good thing, or if that causes practical impact on
> performance.
>
> Personally, for connections that left the slow-start phase, I prefer
> the amount calculated proportional to the current CWND value, which
> IIRC is the default behavior of Linux.
>
> >>
> >> It should also be noted that with QUIC such HoB would not be an issue
> >> since there would no longer be a send buffer within the kernel.
> >
> >
> > Yep, this is definitely an advantage of QUIC.
> >
> >> "Rule 2: Push Resources in the Right Order"
> >>
> >> My take is that the issue can / should be solved by clients sending
> >> PRIORITY frames for pushed resources when they observe how the
> >> resources are used, and that until then servers should schedule the
> >> pushed streams separately from the client-driven prioritization tree
> >> (built by using the PRIORITY frames).
> >>
> >> Please refer to the discussion in the other thread for details:
> >> https://lists.w3.org/Archives/Public/ietf-http-wg/2016JulSep/0453.html
> >
> >
> > To make sure I understand the idea: Suppose you send HTML, then push
> > resources X and Y. You will continue pushing X and Y until you get
> requests
> > from the client, at which point you switch to serving requests made by
> the
> > client (which may or may not include X and Y, as the client may not know
> > about those resources yet, depending on what you decided to push). These
> > client requests are served via the client-driven priority tree.
> >
> > Is that right? If so, you've basically implemented rule #1
>
> Actually not.
>
> My interpretation of rule #1 (or the solution proposed for rule #1)
> was that it discusses the impact of TCP-level head-of-line blocking,
> whereas rule #2 seemed to discuss the issues caused by pushed streams
> not appropriately prioritized against the pulled streams.
>
> And the solution for rule #2 that I revisited here was for a server to
> prioritize _some_ of the pushed streams outside the client-driven
> priority tree.
>
> I am not copy-pasting the scheme described in
> https://lists.w3.org/Archives/Public/ietf-http-wg/2016JulSep/0453.html
> in fear that doing so might lose context, but as an example, it would
> go like this.
>
> Suppose you are sending HTML (in response to a pull), as well as
> pushing two asset files: one is CSS and one is an image.
>
> Among the two assets, it is fair for a server to anticipate that the
> CSS is likely to block the rendering of the HTML. Therefore, the
> server sends CSS before HTML (but does not send a PRIORITY frame for
> the CSS, since PRIORITY frame is a tool for controlling client-driven
> prioritization). OTOH an image is not likely to block the rendering.
> Therefore, it is scheduled as specified by the HTTP/2 specification
> (so that it would be sent after the HTML).
>
> This out-of-client-driven-priotization-tree scheduling should be
> performed until a server receives a PRIORITY frame adjusting the
> precedence of a pushed stream. At this point, a server should
> reprioritize the pushed stream (i.e. CSS) if it considers client's
> knowledge of how the streams should be prioritized superior to what
> the server knows.
>
> > -- the push lasts
> > while the network is idle, then you switch to serving client requests
> > afterwards. It's nice to see that we came to the same high-level
> conclusion
> > :-). But, I like the way you've phrased the problem. Instead of
> computing a
> > priori out how much data you should push, which we suggested, you start
> > pushing an arbitrary number of things, then you'll automatically stop
> > pushing as soon as you get the next client request.
> >
> > One more clarification: what happens when the client loads two pages
> > concurrently and the network is effectively never idle? I assume push
> won't
> > happen in this case?
> >
> > Next, I think you're arguing that push order doesn't matter as long as
> you
> > have a solution for HoB. I don't think this is exactly right.
> Specifically:
> >
> > - Head-of-link blocking (HoB) can happen due to network-level
> bufferbloat.
> > The above solution only applies to kernel-level bufferbloat. You need
> some
> > kind of bandwidth-based pacing to avoid network-level buffer bloat.
>
> That's correct.
>
> OTOH I would like to point out that the issue is irrelevant to push.
>
> A client would issue requests in the order it notices the URLs that it
> should fetch. And it cannot update the priority of the links found in
> LRP headers until it observes how the resource is actually used.
>
> So if preload links included low-priority assets, bufferbloat can (or
> will) cause issues for both pull and push.
>
> > - If you're pushing X and Y, and you know the client will use X before Y,
> > you should push in that order. The opposite order is sub-optimal and can
> > eliminate the benefit of push in some cases, even ignoring HoB.
>
> Agreed.
>
> And my understanding is that both Apache and H2O does this, based on
> the content-type of the pushed response.
>
> Just having two (or three) levels of precedence (send before HTML vs.
> send after HTML vs. send along with HTML) is not as complex as what
> HTTP/2's prioritization tree provides, but I think is sufficient for
> optimizing the time spent until first-render.
>
> What would be the best way to prioritize the blocking assets (i.e. an
> asset that needs to be sent before HTML, e.g. CSS) is what Apache and
> H2O disagree. And my proposal (and what H2O does in that respect) is
> that a server should schedule such pushed streams outside the
> prioritization tree (i.e. my response for rule #2).
>
> >> As a server implementor, I have always dreamt of cancelling a push
> >> after sending a PUSH_PROMISE. In case a resource we want to push
> >> exists on a dedicate cache that cannot be reached synchronously from
> >> the HTTP/2 server, the server needs to send PUSH_PROMISE without the
> >> guarantee that it would be able to push a valid response.
> >>
> >> It would be great if we could have an error code that can be sent
> >> using RST_STREAM to notify the client that it should discard the
> >> PUSH_PROMISE being sent, and issue a request by itself.
> >
> >
> > Yes, +1. I've wanted this feature. It sucks that the client won't reissue
> > the requests if they get a RST_STREAM. (At least, Chrome won't do this, I
> > don't know about other browsers.)
>
>
>
> --
> Kazuho Oku
>
>
Received on Sunday, 4 December 2016 12:46:10 UTC