RE: HTTP/2 re-sync connection level flow control?

Connection level echoing is tough, because of synchronization.  Sure, it's easy to write an extension that periodically declares how many bytes you think the peer has used -- but as of when?  The peer has more bytes in flight, so you need a sync marker to do that.  For that, you'd need a frame that says "what do you think I've used so far?" and gets an answer back for that exact point in the bytestream.  Complicated, for some processing models.  Probably better to do it stream-by-stream and just trust that both endpoints can sum all the streams correctly.

Sending on a closed stream won't work in H3, but an extension could do it in H2 by arrangement.  Imagine an extension frame emitted by receivers in response to a stream (half-)closing which declares how many flow-controlled bytes they received.  When the sender gets it back, it blows up immediately if the number doesn't match.  You'd need both peers to support it, obviously, and you couldn't send it until you know the peer supports the extension (otherwise you'd trigger STREAM_CLOSED or PROTOCOL_ERROR, per 5.1 of RFC7540).

-----Original Message-----
From: Graham King <graham@gkgk.org> 
Sent: Friday, February 15, 2019 9:56 AM
To: ietf-http-wg@w3.org
Subject: Re: HTTP/2 re-sync connection level flow control?

> I was initially mystified by how this would happen in HTTP/2.  The frames always arrive in the same order, so if they're all processed correctly, there should never be a disagreement about what the actual allowed amount is.  Presumably the disagreement is about which sent bytes count.

Yes, a disagreement on which bytes to count, possibly by accident. Imagine a client with an off-by-one error. On every frame the client thinks it sent one more byte than the server thinks. That would work correctly in a wide range of tests, and could easily go unnoticed in production for a while. When you do exhaust the flow control window, the only clue that there's a real problem is that you stay blocked forever.

Off-by-one mistakes are easy to make, but HTTP/2's relative flow control makes them hard to identify. The bug reports you'll get are "sometimes my upload gets stuck" or "I have to restart my browser every day because my email stops working"; many layers away from the problem, and easy to dismiss as networking issues.

In Go it was even harder to spot. The bug depended on how many DATA frames were in-flight when the stream was closed. The higher the latency, the faster you exhaust flow control. If a well-resourced, widely deployed project such as Go can have this bug over five minor versions (1.11.0 to 1.11.5 inclusive), it's easy to imagine other projects having similar (possibly undiagnosed) problems with flow control.

Relative flow-control is brittle. It is easy for HTTP/2 implementers to get wrong, very difficult to notice the mistake, and impossible to recover from during a connection. My goal in writing is to ask if some adjustments can be made to the protocol to eliminate this class of errors.

Received on Friday, 15 February 2019 19:40:19 UTC