Re: Fwd: New Version Notification for draft-nottingham-httpbis-retry-01.txt from Willy Tarreau on 2017-02-02 (ietf-http-wg@w3.org from January to March 2017)

From: Willy Tarreau <w@1wt.eu>
Date: Thu, 2 Feb 2017 09:14:30 +0100
To: Mark Nottingham <mnot@mnot.net>
Cc: HTTP Working Group <ietf-http-wg@w3.org>
Message-ID: <20170202081430.GA20278@1wt.eu>

Hi Mark,

On Wed, Feb 01, 2017 at 07:26:29PM +1100, Mark Nottingham wrote:
> FYI; fairly minor update. Would love to hear what people think about the various suggested paths forward.

Quite an interesting document. The questions left open are even more
difficult for an intermediary because as you've mentionned, methods are
not sufficient to guess application's idempotency and everyone seems to
expect that transient network issues should be hidden from the end user.

I'm even wondering if applications should pass some information to the
client to indicate a guaranteed idempotency that could be indicated back
along the chain by a header. We could thus imagine to replay such POSTs
when the application uses some anti-replay request IDs. Regarding GET,
there's probably no point asking applications to pass such information
because changing them to do this is as difficult as expecting them to
use POST instead, so we should probably continue to consider that GETs
are idempotent.

I found something which can be slightly improved in 4.1 regarding the
detectable conditions for a retry. "Connection closes" and TCP RST
could be cut into to very distinct categories :
- those which happen without ACKing the request
- those which happen after ACKing the request

In the first case it's mostly a race on a idle timeout causing the server
(or an intermediary) to close at the same time the client sends the request.
It is safe to retry because it is guaranteed at the TCP layer that the data
were not consumed.

In the second case you don't know. The data may simply have been delivered
to TCP socket buffers (and ACKed) at the moment the application closed the
listening socket (eg during an application reload). Or the request may have
been delivered to the application, causing it to crash, explaining the close
or reset.

I'd argue that the first ones SHOULD be retried (with reasonable efforts)
while the second ones SHOULD NOT unless the request's idempotency has been
figured out.

One difficulty is that current TCP stacks don't make it easy to find if
data was ACKed and can be dropped. In fact the problem is not the moment
the reset/close event is detected, but the moment before. For an
intermediary it's not possible to buffer infinite amounts of data, so
it's needed to drop transmitted data to make buffer room for more data
(eg for a POST request). And data being ACKed doesn't translate into an
event that can suddenly cause a paused connection to be woken up
(SNDLOWAT 1) usually is ignored. For a client (browser) the issue is
the same except that all elements having lead to the request should
probably still be present to build a completely new request from scratch
so the network buffering issue should not exist.

So in the end we do have some reliable transport-level information to
detect the conditions for a safe retry of non-idempotent requests but
these information are not easy to pick and exploit. It's a particular
issue for TCP Fast Open implementation as a client (I have implemented
TCP Fast Open to servers in haproxy but not merged it yet since I have
not sorted this out yet).

Cheers,
Willy

Received on Thursday, 2 February 2017 08:15:03 UTC