Re: Retrying requests sent to Alternative Services

The general advice we give is that if you sent some[1] of a request and you did NOT receive a positive signal that the request was not processed, it is unsafe to retry automatically.  You might engage higher-order processes that know about whether it might be safe to replay or can make an executive override, but automatic retries might result in processing happening twice.

A 500 series error is a bad signal to use.  5xx is a pretty clear signal that a server started to process the request.

Connection-level issues are sort of bad signals too, but only because there is only uncertainty.  Of course, if the server is throwing out protocol errors, you might decide (akin to the executive override) that such a server deserves all the replays attacks it ends up with.

The errors that are safe are 421, REFUSED_STREAM (h2), H3_REQUEST_REJECTED, and any error that is signaled using GOAWAY where the request stream is higher than the streams identified as having been accepted.

Now, that's theory.

In practice, we do lots of very bad things to keep broken websites working.  I'll let others speak to what we do specifically.

[1] You might argue that a request where you were unable to even send control data is safe to retry, but that's an unlikely situation and probably not worth creating a special case for.

On Thu, Sep 16, 2021, at 04:53, Ryan Hamilton wrote:
> Howdy Folks,
> 
> I'd like to get thoughts from folks about what circumstances a client 
> should resend a request to the origin server directly that was 
> originally sent to an alternate service. For example, Section 2.4 of 
> RFC 7838 <https://datatracker.ietf.org/doc/html/rfc7838#section-2.4> 
> has the following text.
> 
> > Furthermore, if the connection to the alternative service fails or is unresponsive, the client MAY fall back to using the origin or another alternative service.
> 
> But I'm curious about a few other cases that clients might consider. 
> What if the alternative server is responsive, but the connection is 
> aborted because of a protocol error? (For example, the connection is 
> closed because of a QUIC level protocol error like an ACK of a packet 
> that was never sent). What about the case where a 500 error code 
> (Internal Server Error) is received?
> 
> For example, Chrome has logic 
> <https://source.chromium.org/chromium/chromium/src/+/main:net/http/http_network_transaction.cc;drc=11c88cd72d68adfbdb88e55a6841724674eac5d3;l=1656> which retries any request that was sent to an alternative and encountered a QUIC (or HTTP/3) error including the connection timing out. If the resulting (TCP) request succeeds, then the alternative service is disabled for some period of time. Chrome does not, however, retry requests on 500 errors.
> 
> I'd be very interested to hear what clients are doing in practice both 
> in terms of the situations that result in a retry and any effect that 
> retry might have on subsequent use (or not) of the alternative service.
> 
> Cheers,
> 
> Ryan
> 

Received on Thursday, 16 September 2021 02:12:43 UTC