Re: Usage of HTTP/2 PROTOCOL_ERROR and INTERNAL_ERROR from Willy Tarreau on 2022-05-03 (ietf-http-wg@w3.org from April to June 2022)

From: Willy Tarreau <w@1wt.eu>
Date: Tue, 3 May 2022 18:47:18 +0200
To: Guoye Zhang <guoye_zhang@apple.com>
Cc: ietf-http-wg <ietf-http-wg@w3.org>
Message-ID: <20220503164718.GD21564@1wt.eu>

Hi,

On Tue, May 03, 2022 at 02:26:19AM -0700, Guoye Zhang wrote:
> Hi,
> 
> We maintain the HTTP client library on Apple's platforms, and with more
> servers enabling HTTP/2, our error handling logic was recently brought to
> attention.
> 
> To my understanding, PROTOCOL_ERROR means that the other side didn't
> implement the standard correctly, and INTERNAL_ERROR means something happened
> unexpectedly on our side (e.g. crashed). Both of the error codes should be
> fatal and only caused by bugs in software, so we do not attempt to retry or
> perform download resumption.
> 
> However, nginx is using these error codes for transfers that are too slow
> causing timeout, which can occur due to bad network connectivity.
> https://github.com/nginx/nginx/blob/master/src/http/v2/ngx_http_v2.c#L4639
> 
> My question is, should we treat PROTOCOL_ERROR and INTERNAL_ERROR as
> recoverable errors on the client side?

I would say that PROTOCOL_ERROR could be caused by an intermediary messing
up with the connection independently on the client, so it could make sense
to retry only once (or only a few times) in this case. For INTERNAL_ERROR,
it could be caused by a resource shortage on the server (memory allocation
issue for example) so here again it could make sense to try again, but be
even more conservative and maybe not retry instantly. In that regard, the
suitability of the codes used for slow transfers as indicated above is
debatable but I think that to some extents it's not much different from
what we're saying here (at least for INTERNAL_ERROR). For PROTOCOL_ERROR it
might be a bit more concerning if it claims there are protocol violations
that do not really happen but maybe a short read results in an incomplete
frame which itself results in an apparent protocol violation, and in this
case it could be a bit stretched but understandable. In any case I think
that retrying can make sense as long as it's only one or a few times and
no more to avoid making the situation worse.

Just my two cents,
Willy

Received on Tuesday, 3 May 2022 16:47:36 UTC