Re: Usage of HTTP/2 PROTOCOL_ERROR and INTERNAL_ERROR from Guoye Zhang on 2022-05-03 (ietf-http-wg@w3.org from April to June 2022)

From: Guoye Zhang <guoye_zhang@apple.com>
Date: Tue, 03 May 2022 16:48:22 -0700
To: Lucas Pardue <lucaspardue.24.7@gmail.com>
Cc: Willy Tarreau <w@1wt.eu>, ietf-http-wg <ietf-http-wg@w3.org>
Message-id: <BAF07B6C-9628-4183-BB2C-713C5CCF55FC@apple.com>

> On May 3, 2022, at 9:59 AM, Lucas Pardue <lucaspardue.24.7@gmail.com> wrote:
> 
> Hey Guoye,
> 
> On Tue, May 3, 2022 at 5:50 PM Willy Tarreau <w@1wt.eu> wrote:
>> Hi,
>> 
>> On Tue, May 03, 2022 at 02:26:19AM -0700, Guoye Zhang wrote:
>> > Hi,
>> > 
>> > We maintain the HTTP client library on Apple's platforms, and with more
>> > servers enabling HTTP/2, our error handling logic was recently brought to
>> > attention.
>> > 
>> > To my understanding, PROTOCOL_ERROR means that the other side didn't
>> > implement the standard correctly, and INTERNAL_ERROR means something happened
>> > unexpectedly on our side (e.g. crashed). Both of the error codes should be
>> > fatal and only caused by bugs in software, so we do not attempt to retry or
>> > perform download resumption.
>> > 
>> > However, nginx is using these error codes for transfers that are too slow
>> > causing timeout, which can occur due to bad network connectivity.
>> > https://github.com/nginx/nginx/blob/master/src/http/v2/ngx_http_v2.c#L4639
>> > 
>> > My question is, should we treat PROTOCOL_ERROR and INTERNAL_ERROR as
>> > recoverable errors on the client side?
>> 
>> I would say that PROTOCOL_ERROR could be caused by an intermediary messing
>> up with the connection independently on the client, so it could make sense
>> to retry only once (or only a few times) in this case. For INTERNAL_ERROR,
>> it could be caused by a resource shortage on the server (memory allocation
>> issue for example) so here again it could make sense to try again, but be
>> even more conservative and maybe not retry instantly. In that regard, the
>> suitability of the codes used for slow transfers as indicated above is
>> debatable but I think that to some extents it's not much different from
>> what we're saying here (at least for INTERNAL_ERROR). For PROTOCOL_ERROR it
>> might be a bit more concerning if it claims there are protocol violations
>> that do not really happen but maybe a short read results in an incomplete
>> frame which itself results in an apparent protocol violation, and in this
>> case it could be a bit stretched but understandable. In any case I think
>> that retrying can make sense as long as it's only one or a few times and
>> no more to avoid making the situation worse.
> 
> I tend to agree. PROTOCOL_ERROR is defined as " The endpoint detected an unspecific protocol error. This error is for use when a more specific error code is not available", i.e. anything could have gone wrong vs NO_ERROR being nothing went wrong. PROTOCOL_ERROR is a nice escape valve.
> 
> I'm not a client author but I'd be inclined to treat H2 stream resets like I would if a HTTP/1.x transfer was cut short by a connection closure mid request/response. So retries seem to make sense.
> 
> Adapting runtime behaviour based on the error codes sounds quite hard to do in practice. Error codes seem to be of value when used for debugging a particular behaviour or trying to detect problems on a macro scale (i.e., a systematic issue between a client and server deployment). 
> 
> Cheers
> Lucas

Thanks, these all makes sense. We will make a change to translate them into generic connection lost errors which are retry-able.

Guoye

Attachments

application/pkcs7-signature attachment: smime.p7s

Received on Tuesday, 3 May 2022 23:49:01 UTC