Re: HTTP/2 flow control <draft-ietf-httpbis-http2-17>

Bob Briscoe wrote:
> Stuart,
> At 00:18 20/03/2015, Stuart Douglas wrote:
>>     ==a) Intermediate buffer control==
>>     For this, sliding window-based flow control would be appropriate,
>>     because the goal is to keep the e2e pipeline full without wasting
>>     buffer.
>>     Let me prove HTTP/2 cannot do window flow control. For window flow
>>     control, the sender needs to be able to advance both the leading
>>     and trailing edges of the window. In the draft:
>>     * WINDOW_UPDATE frames can only advance the leading edge of a
>>     'window' (and they are constrained to positive values).
>>     * To advance the trailing edge, window flow control would need a
>>     continuous stream of acknowledgements back to the sender (like
>>     TCP). The draft does not provide ACKs at the app-layer, and the
>>     app-layer cannot monitor ACKs at the transport layer, so the
>>     sending app-layer cannot advance the trailing edge of a 'window'.
>>     So the protocol can only support credit-based flow control. It is
>>     incapable of supporting window flow control.
>>     Next, I don't understand how a receiver can set the credit in
>>     'WINDOW_UPDATE' to a useful value. If the sender needed the
>>     receiver to answer the question "How much more can I send than I
>>     have seen ACK'd?" that would be easy. But because the protocol is
>>     restricted to credit, the sender needs the receiver to answer the
>>     much harder open-ended question, "How much more can I send?" So
>>     the sender needs the receiver to know how many ACKs the sender has
>>     seen, but neither of them know that.
>>     The receiver can try, by taking a guess at the bandwidth-delay
>>     product, and adjusting the guess up or down, depending on whether
>>     its buffer is growing or shrinking. But this only works if the
>>     unknown bandwidth-delay product stays constant.
>>     However, BDP will usually be highly variable, as other streams
>>     come and go. So, in the time it takes to get a good estimate of
>>     the per-stream BDP, it will probably have changed radically, or
>>     the stream will most likely have finished anyway. This is why TCP
>>     bases flow control on a window, not credit. By complementing
>>     window updates with ACK stream info, a TCP sender has sufficient
>>     info to control the flow.
>>     The draft is indeed correct when it says:
>>     "¬ ¬ this can lead to suboptimal use of available
>>     ¬ ¬ network resources if flow control is enabled without knowledge
>>     of the
>>     ¬ ¬ bandwidth-delay product (see [RFC7323]).
>>     "
>>     Was this meant to be a veiled criticism of the protocol's own
>>     design? A credit-based flow control protocol like that in the
>>     draft does not provide sufficient information for either end to
>>     estimate the bandwidth-delay product, given it will be varying
>>     rapidly.
>> From my point of view as a server/proxy implementor the flow control
>> window mostly represent the amount of data I am prepared to buffer.¬
>> From the server as a receiver case we are basically acting as an
>> intermediary between the network and an end user application.
> Yes.
>> In this case the flow control credit basically represents the maximum
>> amount of data that I am prepared to buffer at the HTTP layer. In this
>> case the answer to 'how much more can I send' is simple, it is
>> basically the amount of data I am prepared to buffer. Because I have
>> no idea how quickly (if at all) the user application will consume the
>> data, all I can really do is buffer it and deliver it as the user
>> application requests it.¬
>> If I set the flow control window to larger than I am prepared to
>> buffer and the application does not consume data quickly enough then I
>> have two options, which are basically stop reading (head of line
>> blocking), or reset the stream, neither of which is particularly good
>> (head of line blocking is particularly problematic if the user
>> application is trying to write a response before reading the request,
>> as window updates will not be processed).¬
>> If I am only prepared to buffer a small amount of data then my
>> performance is not going to be great no matter what flow control
>> implementation is in use, and I think that this is basically a
>> limitation of a multiplexed protocol (unless you are prepared to
>> accept potential HOL blocking).
>> This mostly only affects server uploads, as clients will likely have
>> to buffer the whole response anyway so are not constrained by buffer
>> size.¬
>> All the issues above basically apply to the intermediary/proxy use
>> case as well.¬
>> Basically I guess what I am getting to is that yes, there are some
>> situations where HTTP2 might perform worse than HTTP1, however I think
>> the underlying problem is intrinsic to any sort of multiplexed
>> protocol, rather than HTTP2's flow control mechanism.
> There I disagree. The problem only applies to h2 because it sits above
> TCP, and is not able to see the ACK information that TCP sees. If stream
> flow control is implemented in the same layer as both connection flow
> control and segment acknowledgement, then I suspect it could be made to
> work properly.
> SCTP, Minion and QUIC have this architecture, so implementing stream
> flow control would not be an unsolvable problem (if a use for it were
> found, and if it could be made deadlock-free).
>>     ==b) Control by the ultimate client app==
>>     For this case, I believe neither window nor credit-based flow
>>     control is appropriate:
>>     * There is no memory management issue at the client end - even if
>>     there's a separate HTTP/2 layer of memory between TCP and the app,
>>     it would be pointless to limit the memory used by HTTP/2, because
>>     the data is still going to sit in the same user-space memory (or
>>     at least about the same amount of memory) when HTTP/2 passes it
>>     over for rendering.
>> Not necessarily, not all clients are browsers, and even with browsers
>> when downloading a file I imagine the data will generally be
>> transferred straight to disk rather staying in memory. In general I
>> agree with you though, and I think most clients will want to set a
>> large window size.
>> ¬
>>     * Nonetheless, the receiving client does need to send messages to
>>     the sender to supplement stream priorities, by notifying when the
>>     state of the receiving application has changed (e.g. if the user's
>>     focus switches from one browser tab to another).
>>     * However, credit-based flow control would be very sluggish for
>>     such control, because credit cannot be taken back once it has been
>>     given (except HTTP/2 allows SETTINGS_INITIAL_WINDOW_SIZE to be
>>     reduced, but that's a drastic measure that hits all streams
>>     together).
>> This support is provided by PRIORITY frames,
> Yes (as you can see, I suggested priorities myself). I was assuming
> per-stream flow control had been proposed for some reason, and having
> knocked down the other reasons I could think of, the only one left was
> for the client to use flow control to complement priorities.
> You're saying per-stream flow control is not even safe for this case,
> and your reasoning below seems sound. So I guess you're implying there
> is very little, if anything, that per-stream flow control is good for.
> Am I putting words in your mouth?

Very much so. Per stream flow control is essential for buffer management 
on the server.

Basically my server implementation has to keep reading from the socket, 
to make sure that any new messages are received and acted on in a timely 
manner, and to allow multiplexing to work correctly. Consider the case 
of a client uploading a large file, where for whatever reason the end 
user application is slow to read the data (perhaps it is doing some 
database operations before it starts to read the file data). If we did 
not have per stream flow control I would only have three options:

- buffer a potentially unbounded amount of data
- stop reading from the socket (HOL blocking)
- kill the connection

All these options suck, but with per stream flow control I know I will 
never have to buffer more than the window size (as we don't send 
WINDOW_UPDATE until the application has consumed the data).

Another good example of why per stream flow control is necessary is a 
load balancer, with a fast connection to backend servers, and a slow 
connection to the remote clients. If we did not have per stream flow 
control the backend servers would just dump all their data straight to 
the load balancer, forcing it to either block or buffer (you implied 
that TCP level flow control should be sufficient for this case, however 
this is only true if your load balancer maintains a 1:1 connection 
between the client and a backend server, which is generally not the case.

 From a performance point of view the real issue here is how much a 
server is prepared to buffer relative to the bandwidth delay product. If 
a server is prepared to buffer enough data it can set the flow control 
window high enough that the window should never be exhausted.

If you have a server that is not prepared to buffer very much data then 
you could well have worse performance that HTTP1, however that is the 
price you pay for a multiplexed protocol. If you are not prepared to 
buffer much data you could still set a high flow control window to 
achieve similar performance to HTTP1, but you have to accept that you 
may end up with HOL blocking.


>> using flow control for such use cases is very problematic and has a
>> good chance of leading to deadlocks.
> Setting aside whether stream flow control is useful,...
> ... if proper stream flow control were feasible (e.g. if it were
> implemented in the transport layer), I believe the protocol could be
> made structurally impossible to deadlock within its own layer (aside
> from any deadlock at the app-layer, as you point out later). I've
> written the rules for avoiding flow control deadlock into the Inner
> Space spec (using understanding gained from multipath TCP). My ambition
> is to give a proof that it cannot deadlock, even when combined with the
> byzantine behaviours of middleboxes. It's a tough ambition, but needed
> to determine any necessary conditions for the proof to hold.
>> For example consider a Java Servlet container, new requests will be
>> read from the underlying connection, and then dispatched to a worker
>> thread to generate the page. Once a server has started processing a
>> request there is no way to pause the request in a way that frees up
>> the resources (threads, database connections etc) in use. I think
>> almost every use case more complex than simple file serving has this
>> issue, once a server has started processing a request there is
>> generally no way to suspend it and free the resources.¬
> I suspect you are right. The deadlock avoidance I was talking about
> above is a simpler problem (tho by no means trivial), because it solely
> concerns flow-through buffers, not endpoint resources.
>> So in your example if the user switches tabs and you stop sending
>> window updates for streams in use by the old browser tab, while
>> sending new requests for the new tab it is possible the server will be
>> in a situation where it is not prepared to allocate resources for the
>> new requests until the existing ones are complete, however these will
>> never complete as the browser has stopped sending window updates.¬
>> You could argue that a server should limit the max streams value to
>> the number of streams that it is prepared to allocate resources for,
>> however this greatly limits the utility of the priority mechanism, as
>> it means that servers will always handle requests on a first come
>> first served basis. If we set the maximum streams value to a higher
>> amount than what we are prepared to allocate resources for and queue
>> requests then when a request finishes we can pick the highest priority
>> request from the queue to allocate resources to (not to mention there
>> is no round trip delay because the request is queued).
>> Basically IMHO flow control should not be used to control priority,
>> that is what PRIORITY frames are for, and in general servers can only
>> ever do priority on a best effort approach anyway. If you try and use
>> flow control to enforce a strict priority mechanism you run a very
>> real risk of deadlocks.
> That sounds like a reasonable rule, at least for now until we gain
> better understanding.
> This relates to the part of my review about h2 entering uncharted
> theoretical territory, by allowing implementers free rein on what they
> do with the protocol elements, with no guidance or constraints. One end
> might write code that interprets flow control messages with some
> priority-related semantics, while the implementation at the other end
> did not intend that.
>> ¬
>>     ==Flow control problem summary==
>>     With only a credit signal in the protocol, a receiver is going to
>>     have to allow generous credit in the WINDOW_UPDATEs so as not to
>>     hurt performance. But then, the receiver will not be able to
>>     quickly close down one stream (e.g. when the user's focus
>>     changes), because it cannot claw back the generous credit it gave,
>>     it can only stop giving out more.
>>     IOW: Between a rock and a hard place,... but don't tell them where
>>     the rock is.
>> For the reasons I outlined above I don't think this is actually a
>> problem.
> Certainly, if flow control is not used to close down a stream, then not
> being able to close down a stream won't be a problem. That does leave
> the problem of what per-stream flow control /can/ be used for, if it
> can't be used to slow down streams!
> The only believeable use-case I've seen so far is blocking a stream's
> progress when it is first created.
>> Also in terms of clawing back credit I thought that in general it was
>> a bad idea? The TCP RFC explicitly states that "shrinking the window"
>> is strongly discouraged, although I must admit I am not fully aware of
>> the reasoning.
>  From memory, the reasoning was simply that TCP can't unsend what it has
> already sent. So shrinking the window leaves the protocol in an
> indeterminate state where more buffer might be needed anyway.
> Bob
>> Stuart
>> ¬
>>     ==Towards a solution?==
>>     I think 'type-a' flow control (for intermediate buffer control)
>>     does not need to be at stream-granularity. Indeed, I suspect a
>>     proxy could control its app-layer buffering by controlling the
>>     receive window of the incoming TCP connection. Has anyone assessed
>>     whether this would be sufficient?
>>     I can understand the need for 'type-b' per-stream flow control (by
>>     the ultimate client endpoint). Perhaps it would be useful for the
>>     receiver to emit a new 'PAUSE_HINT' frame on a stream? Or perhaps
>>     updating per-stream PRIORITY would be sufficient? Either would
>>     minimise the response time to a half round trip. Whereas credit
>>     flow-control will be much more sluggish (see 'Flow control problem
>>     summary').
>>     Either approach would correctly propagate e2e. An intermediate
>>     node would naturally tend to prioritise incoming streams that fed
>>     into prioritised outgoing streams, so priority updates would tend
>>     to propagate from the ultimate receiver, through intermediate
>>     nodes, up to the ultimate sender.
>>     ==Flow control coverage==
>>     The draft exempts all TCP payload bytes from flow control except
>>     HTTP/2 data frames. No rationale is given for this decision. The
>>     draft says it's important to manage per-stream memory, then it
>>     exempts all the frame types except data, even tho each byte of a
>>     non-data frame consumes no less memory than a byte of a data frame.
>>     What message does this put out? "Flow control is not important for
>>     one type of bytes with unlimited total size, but flow control is
>>     so important that it has to be mandatory for the other type of bytes."
>>     It is certainly critical that WINDOW_UPDATE messages are not
>>     covered by flow control, otherwise there would be a real risk of
>>     deadlock. It might be that there are dependencies on other frame
>>     types that would lead to a dependency loop and deadlock. It would
>>     be good to know what the rationale behind these rules was.
>> I think a lot of people had similar concerns. There is a discussion
>> about it here:¬
>> ¬
>>     ==Theory?==
>>     I am concerned that HTTP/2 flow control may have entered new
>>     theoretical territory, without suitable proof of safety. The only
>>     reassurance we have is one implementation of a flow control
>>     algorithm (SPDY), and the anecdotal non-evidence that no-one using
>>     SPDY has noticed a deadlock yet (however, is anyone monitoring for
>>     deadlocks?).
>>     Whereas SPDY has been an existence proof that an approach like
>>     http/2 'works', so far all the flow control algos have been pretty
>>     much identical (I think that's true?). I am concerned that the
>>     draft takes the InterWeb into uncharted waters, because it allows
>>     unconstrained diversity in flow control algos, which is an
>>     untested degree of freedom.
>>     The only constraints the draft sets are:
>>     * per-stream flow control is mandatory
>>     * the only protocol message for flow control algos to use is the
>>     WINDOW_UPDATE credit message, which cannot be negative
>>     * no constraints on flow control algorithms.
>>     * and all this must work within the outer flow control constraints
>>     of TCP.
>>     Some algos might use priority messages to make flow control
>>     assumptions. While other algos might associate PRI and
>>     WINDOW_UPDATE with different meanings. What confidence do we have
>>     that everyone's optimisation algorithms will interoperate? Do we
>>     know there will not be certain types of application where deadlock
>>     is likely?
>>     "¬ ¬ When using flow
>>     ¬ ¬ control, the receiver MUST read from the TCP receive buffer in a
>>     ¬ ¬ timely fashion.¬ Failure to do so could lead to a deadlock when
>>     ¬ ¬ critical frames, such as WINDOW_UPDATE, are not read and acted
>>     upon.
>>     "
>>     I've been convinced (offlist) that deadlock will not occur as long
>>     as the app consumes data 'greedily' from TCP. That has since been
>>     articulated in the above normative text. But how sure can we be
>>     that every implementer's different interpretations of 'timely'
>>     will still prevent deadlock?
>>     Until a good autotuning algorithm for TCP receive window
>>     management was developed, good window management code was nearly
>>     non-existent. Managing hundreds of interdependent stream buffers
>>     is a much harder problem. But implementers are being allowed to
>>     just 'Go forth and innovate'. This might work if everyone copies
>>     available open source algo(s). But they might not, and they don't
>>     have to.
>>     This all seems like 'flying by the seat of the pants'.
>>     ==Mandatory Flow Control? ==
>>     "¬ ¬ ¬ ¬ ¬ 3. [...] A sender
>>     ¬ ¬ ¬ ¬ ¬ ¬ MUST respect flow control limits imposed by a receiver."
>>     This ought to be a 'SHOULD' because it is contradicted later - if
>>     settings change.
>>     "¬ ¬ 6.¬ Flow control cannot be disabled."
>>     Also effectively contradicted half a page later:
>>     "¬ ¬ Deployments that do not require this capability can advertise
>>     a flow
>>     ¬ ¬ control window of the maximum size (2^31-1), and by
>>     maintaining this
>>     ¬ ¬ window by sending a WINDOW_UPDATE frame when any data is received.
>>     ¬ ¬ This effectively disables flow control for that receiver."
>>     And contradicted in the definition of half closed (remote):
>>     "¬ half closed (remote):
>>     ¬ ¬ ¬ ¬ ¬ [...] an endpoint is no longer
>>     ¬ ¬ ¬ ¬ ¬ obligated to maintain a receiver flow control window.
>>     "
>>     And contradicted in 8.3. The CONNECT Method
>>     <>,
>>     which says:
>>     "¬ Frame types other than DATA
>>     ¬ ¬ or stream management frames (RST_STREAM, WINDOW_UPDATE, and
>>     PRIORITY)
>>     ¬ ¬ MUST NOT be sent on a connected stream, and MUST be treated as a
>>     ¬ ¬ stream error (Section 5.4.2) if received.
>>     "
>>     Why is flow control so important that it's mandatory, but so
>>     unimportant that you MUST NOT do it when using TLS e2e?
>>     Going back to the earlier quote about using the max window size,
>>     it seems perverse for the spec to require endpoints to go through
>>     the motions of flow control, even if they arrange for it to affect
>>     nothing, but to still require implementation complexity and
>>     bandwidth waste with a load of redundant WINDOW_UPDATE frames.
>>     HTTP is used on a wide range of devices, down to the very small
>>     and challenged. HTTP/2 might be desirable in such cases, because
>>     of the improved efficiency (e.g. header compression), but in many
>>     cases the stream model may not be complex enough to need stream
>>     flow control.
>>     So why not make flow control optional on the receiving side, but
>>     mandatory to implement on the sending side? Then an implementation
>>     could have no machinery for tuning window sizes, but it would
>>     respond correctly to those set by the other end, which requires
>>     much simpler code.
>>     If a receiving implemention chose not to do stream flow control,
>>     it could still control flow at the connection (stream 0) level, or
>>     at least at the TCP level.
>>     ==Inefficiency?==
>>     5.2. Flow Control
>>     <>
>>     "Flow control is used for both individual
>>     ¬ ¬ streams and for the connection as a whole."
>>     Does this means that every WINDOW_UPDATE on a stream has to be
>>     accompanied by another WINDOW_UPDATE frame on stream zero? If so,
>>     this seems like 100% message redundancy. Surely I must¬ have
>>     misunderstood.
>>     ==Flow Control Requirements===
>>     I'm not convinced that clear understanding of flow control
>>     requirements has driven flow control design decisions.
>>     The draft states various needs for flow-control without giving me
>>     a feel of confidence that it has separated out the different
>>     cases, and chosen a protocol suitable for each. I tried to go back
>>     to the early draft on flow control requirements <
>>     >, and I was not impressed.
>>     I have quoted below the various sentences in the draft that state
>>     what flow control is believed to be for. Below that, I have
>>     attempted to crystalize out the different concepts, each of which
>>     I have tagged within the quotes.
>>     * 2. HTTP/2 Protocol Overview
>>     <> says
>>     ¬ "Flow control and prioritization ensure that it is possible to
>>     efficiently use multiplexed streams. [Y]
>>     ¬ ¬ Flow control (Section 5.2) helps to ensure that only data that
>>     can be used by a receiver is transmitted. [X]"
>>     * 5.2. Flow Control
>>     <>
>>     says:
>>     ¬ "Using streams for multiplexing introduces contention over use
>>     of the TCP connection [X], resulting in blocked streams [Z]. A
>>     flow control scheme ensures that streams on the same connection do
>>     not destructively interfere with each other [Z]."
>>     * 5.2.2. Appropriate Use of Flow Control
>>     <>
>>     "¬ Flow control is defined to protect endpoints that are operating
>>     under
>>     ¬ ¬ resource constraints.¬ For example, a proxy needs to share memory
>>     ¬ ¬ between many connections, and also might have a slow upstream
>>     ¬ ¬ connection and a fast downstream one [Y].¬ Flow control
>>     addresses cases
>>     ¬ ¬ where the receiver is unable to process data on one stream,
>>     yet wants
>>     ¬ ¬ to continue to process other streams in the same connection [X]."
>>     "¬ Deployments with constrained resources (for example, memory) can
>>     ¬ ¬ employ flow control to limit the amount of memory a peer can
>>     consume. [Y]
>>     Each requirement has been tagged as follows:
>>     [X] Notification of the receiver's changing utility for each stream
>>     [Y] Prioritisation of streams due to contention over the streaming
>>     capacity available to the whole connection.
>>     [Z] Ensuring one stream is not blocked by another.
>>     [Z] might be a variant of [Y], but [Z] sounds more binary, whereas
>>     [Y] sounds more like optimisation across a continuous spectrum.
>>     Regards
>>     Bob
>>     ________________________________________________________________
>>     Bob Briscoe,¬ ¬ ¬ ¬ ¬ ¬ ¬ ¬ ¬ ¬ ¬ ¬ ¬ ¬ ¬ ¬ ¬ ¬ ¬ ¬ ¬ ¬ ¬ ¬ ¬ ¬ ¬
>>     ¬ ¬ ¬ ¬ ¬ ¬ ¬ ¬ ¬ ¬ ¬ ¬ ¬ ¬ ¬ ¬ ¬ ¬ ¬ ¬ ¬ ¬ BT
> ________________________________________________________________
> Bob Briscoe, BT

Received on Saturday, 21 March 2015 08:16:04 UTC