Re: HTTP/2 re-sync connection level flow control?

> I was initially mystified by how this would happen in HTTP/2.  The frames always arrive in the same order, so if they're all processed correctly, there should never be a disagreement about what the actual allowed amount is.  Presumably the disagreement is about which sent bytes count.

Yes, a disagreement on which bytes to count, possibly by accident. Imagine a client with an off-by-one error. On every frame the client thinks it sent one more byte than the server thinks. That would work correctly in a wide range of tests, and could easily go unnoticed in production for a while. When you do exhaust the flow control window, the only clue that there's a real problem is that you stay blocked forever.

Off-by-one mistakes are easy to make, but HTTP/2's relative flow control makes them hard to identify. The bug reports you'll get are "sometimes my upload gets stuck" or "I have to restart my browser every day because my email stops working"; many layers away from the problem, and easy to dismiss as networking issues.

In Go it was even harder to spot. The bug depended on how many DATA frames were in-flight when the stream was closed. The higher the latency, the faster you exhaust flow control. If a well-resourced, widely deployed project such as Go can have this bug over five minor versions (1.11.0 to 1.11.5 inclusive), it's easy to imagine other projects having similar (possibly undiagnosed) problems with flow control.

Relative flow-control is brittle. It is easy for HTTP/2 implementers to get wrong, very difficult to notice the mistake, and impossible to recover from during a connection. My goal in writing is to ask if some adjustments can be made to the protocol to eliminate this class of errors.

Received on Friday, 15 February 2019 17:56:52 UTC