Re: Usage of HTTP/2 PROTOCOL_ERROR and INTERNAL_ERROR from Lucas Pardue on 2022-05-03 (ietf-http-wg@w3.org from April to June 2022)

From: Lucas Pardue <lucaspardue.24.7@gmail.com>
Date: Tue, 3 May 2022 17:59:46 +0100
To: Willy Tarreau <w@1wt.eu>
Cc: Guoye Zhang <guoye_zhang@apple.com>, ietf-http-wg <ietf-http-wg@w3.org>
Message-ID: <CALGR9oYHu4NCmR4TaNSw=+oYwTzLMF9OPzKuu9bC6xX-GjhLGA@mail.gmail.com>

Hey Guoye,

On Tue, May 3, 2022 at 5:50 PM Willy Tarreau <w@1wt.eu> wrote:

> Hi,
>
> On Tue, May 03, 2022 at 02:26:19AM -0700, Guoye Zhang wrote:
> > Hi,
> >
> > We maintain the HTTP client library on Apple's platforms, and with more
> > servers enabling HTTP/2, our error handling logic was recently brought to
> > attention.
> >
> > To my understanding, PROTOCOL_ERROR means that the other side didn't
> > implement the standard correctly, and INTERNAL_ERROR means something
> happened
> > unexpectedly on our side (e.g. crashed). Both of the error codes should
> be
> > fatal and only caused by bugs in software, so we do not attempt to retry
> or
> > perform download resumption.
> >
> > However, nginx is using these error codes for transfers that are too slow
> > causing timeout, which can occur due to bad network connectivity.
> >
> https://github.com/nginx/nginx/blob/master/src/http/v2/ngx_http_v2.c#L4639
> >
> > My question is, should we treat PROTOCOL_ERROR and INTERNAL_ERROR as
> > recoverable errors on the client side?
>
> I would say that PROTOCOL_ERROR could be caused by an intermediary messing
> up with the connection independently on the client, so it could make sense
> to retry only once (or only a few times) in this case. For INTERNAL_ERROR,
> it could be caused by a resource shortage on the server (memory allocation
> issue for example) so here again it could make sense to try again, but be
> even more conservative and maybe not retry instantly. In that regard, the
> suitability of the codes used for slow transfers as indicated above is
> debatable but I think that to some extents it's not much different from
> what we're saying here (at least for INTERNAL_ERROR). For PROTOCOL_ERROR it
> might be a bit more concerning if it claims there are protocol violations
> that do not really happen but maybe a short read results in an incomplete
> frame which itself results in an apparent protocol violation, and in this
> case it could be a bit stretched but understandable. In any case I think
> that retrying can make sense as long as it's only one or a few times and
> no more to avoid making the situation worse.
>

I tend to agree. PROTOCOL_ERROR is defined as " The endpoint detected an
unspecific protocol error. This error is for use when a more specific error
code is not available", i.e. anything could have gone wrong vs NO_ERROR
being nothing went wrong. PROTOCOL_ERROR is a nice escape valve.

I'm not a client author but I'd be inclined to treat H2 stream resets like
I would if a HTTP/1.x transfer was cut short by a connection closure mid
request/response. So retries seem to make sense.

Adapting runtime behaviour based on the error codes sounds quite hard to do
in practice. Error codes seem to be of value when used for debugging a
particular behaviour or trying to detect problems on a macro scale (i.e., a
systematic issue between a client and server deployment).

Cheers
Lucas

Received on Tuesday, 3 May 2022 16:59:57 UTC