Re: Results from adopting HTTP/3 priority from Alan Frindell on 2021-04-29 (ietf-http-wg@w3.org from April to June 2021)

From: Alan Frindell <afrind@fb.com>
Date: Thu, 29 Apr 2021 15:50:38 +0000
To: Robin MARX <robin.marx@uhasselt.be>, Lucas Pardue <lucaspardue.24.7@gmail.com>
CC: Yang Chi <yangchi@fb.com>, HTTP Working Group <ietf-http-wg@w3.org>
Message-ID: <5F974649-991C-4C18-BE61-E25F1446DC05@fb.com>
Hi Robin, thanks for the detailed questions.  Answers inline.

> 1) Can you detail how often you use the "incremental" flag and for which types of resources (and in which urgency "buckets")? And why? 

Prior to adopting the http-priority draft, the mvfst scheduler treated everything as incremental with equivalent urgency.  After adopting the draft, we set the default priority to "u=3, i" to match this behavior.

In the places where our clients specify priorities, media requests show better results with the incremental flag, and API requests show better results without the incremental flag. We think videos in particular can benefit from the incremental flag when both audio and video requests are over the same connection. It’s not clear why API doesn’t benefit from it, but it's possible (at least in the app where we tried priority), the application layer still needs to wait for the entire JSON response to be downloaded before parsing it.

>    a) I also wonder if you have more data/insight on how much QUIC's HOL blocking removal has helped in practice and how that interacts with the incremental logic. Matt has mentioned this before, but didn't share details as of yet. 

We don’t have any additional data demonstrating how removing HoL blocking impacts perf, with or without respect to priority.

>    b) If you do incremental, do you still Round-Robin streams on a per-packet basis or have you switched to larger bursts per stream? Have you experimented with other approaches? 

We still RR on a per-packet basis. 


>    c) How exactly do you deal with mixed incremental/non-incremental resources in the same urgency bucket? 
>       Say a and b are not incremental, then c and d are, then e again is not. Do you do a, b, mixed cd, e? or for example: a, b, e, mixed cd (i.e., non-incremental is a bigger indicator of priority than stream ID/request order)


We do use two queues per priority bucket, so would send: a, b, e, mixed cd.  We sort the non-incremental streams by stream ID, which is not perfect but it does work for the common case.  Notably this is imperfect because there are 3 ID spaces for the sender (client bidi, server bidi -- unused in vanilla HTTP, and unidirectional).  We prioritize control streams over all other streams, but server pushes could easily jump ahead of earlier requests of equivalent urgency.  I agree the spec is vague here; I’m curious to hear if anyone else is using a different approach, how costly it is to implement and how that has performed.


> 2) You mention "implementing priorities directly in the QUIC layer": do you have insight in the design of the API between the H3 and the QUIC layer for this? 
>      d) Does the QUIC part internally implement this as a direct analogue to the H3 priorities (say a list of urgency buckets, with some more complexity for incremental vs non-incremental subslices, see c)) or does QUIC offer a simpler/more complex setup onto which you map the H3 signals?


We have a direct analog of the HTTP API implemented in the transport. The API is 

setPriority(streamId, urgency, incrementalFlag)

Our original design (dating to 2017) of the API between mvfst and proxygen was set up to allow for the transport to call back into the application (HTTP) layer to perform prioritization, specifically because we didn’t want the ugly complexity of the HTTP priority tree inside the transport.  This design turned out to be very complicated, and never really worked the way it was intended.  We’re still in the process of ripping it out.

The urgency/incremental priority scheme is simple enough to implement and probably powerful enough to work for future non-HTTP applications.  There are some constructs that application developers would like to express that are harder in the new scheme however -- namely a strict ordering of resources within an urgency level.   The concrete example is a list of video segments: you want the first before the second, second before third, and so on.  If the list of segments is longer than 8, you can’t implement this exactly according to the specification.  We’re able to fake it using the same urgency level, non-incremental and ordering by stream ID, but as noted earlier, the spec is vague about how to order non-incremental resources at the same urgency level.

>      e) Is there a coupling between stream-level flow control and stream priority? (e.g., high priority streams get higher allowances from the client than low-priority streams, to make sure they're not FC-blocked?). I assume client-side FC allowance is always high enough, but still :) 


We don’t allocate connection flow control according to priority.  We pretty much have flow control set high enough on the client that it shouldn’t be much of a problem.

>      You mention TCP_NOTSENT_LOWAT, but the concept of deep vs shallow buffers still stands in QUIC as well: 
>     f) Do you prepare buffers with data to be sent or do you (only) generate QUIC packets/DATA frames on the fly? 

We run the priority algorithm to pick a stream when mvfst is ready to send a packet -- there is no buffering after prioritization.  The HTTP/3 DATA frames are pre-generated but the QUIC STREAM frames are not.


>     g) In the first case, how large are the buffers? Do you flush them if a new, higher priority request arrives? 

Not applicable since we don’t do it that way.


> 3) With regards to prioritizing retransmits:
>   Is this deployed on a CDN-style setup, where e.g., the edge has low-priority (static) resources cached and starts sending them before receiving higher priority responses from the "origin"?

Static (cacheable) resources and dynamic resources that always hit origin are fetched over different connections today.  The prioritization within these two subgroups are separate.  That said, it’s possible that within a static connection, low priority content is cached and high priority content is arrives at the edge later.


>   I would assume that in this setup, you'd have cases where sending high-priority fresh data before low-priority retransmits does matter a lot, especially for non-incremental resources (though it depends on many factors). Do you have any insight on this scenario? 

I don’t think we have an specific insight into how prioritizing the high priority data over low priority retransmits affects performance in this scenario, other than it seems like the right thing to do.

>   h) if deployed in a CDN-style setup, do you use (persistent) H3/QUIC to the origin as well? I mainly wonder about if (and how) you tweak priority across requests from multiple clients coalesced onto a single "backend connection" (as that's one of the use cases we "lose" by dropping the dependency tree).

We have a mix of persistent HTTP/2 and HTTP/3 connections between our edge nodes and origins.  We currently give all requests on these connections equal priority, with the incremental flag.  To my knowledge, no CDN has ever implemented any kind of priority scheme for multiplexed connections to origin.  At the last HTTP Workshop, it seemed that very few CDNs even used a multiplexed protocol for that hop.


-Alan
Received on Thursday, 29 April 2021 15:50:58 UTC