Re: Can servers generate responses to malformed requests in h2? from Willy Tarreau on 2023-09-25 (ietf-http-wg@w3.org from July to September 2023)

From: Willy Tarreau <w@1wt.eu>
Date: Mon, 25 Sep 2023 07:26:05 +0200
To: Glenn Strauss <gs-lists-ietf-http-wg@gluelogic.com>
Cc: Martin Thomson <mt@lowentropy.net>, Lucas Pardue <lucaspardue.24.7@gmail.com>, HTTP Working Group <ietf-http-wg@w3.org>
Message-ID: <ZREZ7a3jwUdJDOkv@1wt.eu>
Hi Glenn,

On Sun, Sep 24, 2023 at 11:33:03PM -0400, Glenn Strauss wrote:
> I am all for clarification.  However, I ask that we please avoid
> overspecification of implementation unless there is both really good
> reason and confidence that the specified implementation is the
> one-right-implementation. If not, and there are security concerns
> with possible implementer mistakes, the RFC should instead call those
> out in Security Considerations to highlight concerns for implementers.

Agreed!

> Speaking as an implementer of the specification, I was able to reuse
> much of lighttpd's HTTP header parsing and security policy code for
> both HTTP/1.x and HTTP/2.

It was the same for our very first H2 implementation in haproxy, in
fact it basically converted H2 to H1 and all HTTP processing was done
there. Only later we implemented in internal representation that also
conveys semantics and now H1/H2/H3 differ at a much lower layer.

> My understanding is that I could treat all HTTP request header errors
> received as HTTP/2 requests as h2 stream error PROTOCOL_ERROR.

That's fine, but everyone doesn't always have this option. H2 is special
in that it mixes transport and representation of semantics. At the same
layer you can have framining issues (e.g. SETTINGS on a stream) and
semantics issues (e.g. what if you find an LF character in a :authority
header). As previously said, some implementations might be unable to
produce a valid HTTP error, because these errors need to be decorated
with headers reporting a unique request ID or whatever, and no HTTP
request was ever produced there. I would even go as far as suggesting
that certain subtle H2 errors should not be allowed to produce HTTP
contents. Imagine for example that you receive a DATA frame on a new
stream. The spec says that you must respond with a stream error. Given
that the stream was never really created (no headers frame to create it),
it could possibly be problematic to send a 400 response there. E.g.
imagine the following sequence:

   client                    server

    DATA (id=1) ----------->
                <---------- :status 400
    HEADERS (id=1) -------->
                <---------- :status 200

It might very well be possible that the 400 is taken as the response
to the HEADERS frame. Sending RST_STREAM on framing issues like this
avoids such problems.

However I consider that any error that is detected after HTTP decoding
could be eligible for a 400.

> If that
> is what is desired by the RFC authors, please issue an update or errata
> and I'll change that code in lighttpd.
> 
> 
> However, I currently am unable to understand why h2 PROTOCOL_ERROR
> (or H3_GENERAL_PROTOCOL_ERROR) is somehow better than 400 Bad Request.
> 
> Here is another reason why I think an HTTP error code may be preferable:
> An HTTP/1.0 client might make a request to a proxy which makes an HTTP/2
> or HTTP/3 request.  400 Bad Request is an application level code which
> should be transmitted all the way back to the client.  An h2 stream
> error PROTOCOL_ERROR sent to an intermediary might not make it back to
> the HTTP/1.0 client in a form that conveys the error clearly to the
> end-user.  (One may argue that the intermediary should have detected
> the malformed request, but the origin server might implement a stricter
> security policy and is permitted to return PROTOCOL_ERROR.)

That's a good point. If the intermediary is certain that it does not
produce bad frames, it may consider that in this case the server rejected
its request on the grounds of the HTTP contents, and it could very well
translate this RST_STREAM(protocol_error) into an HTTP/1.0 400 bad req.

> I can see where h2 specification around pseudo-headers is specific to
> the h2 protocol.  However, an implementation "could" HPACK-decode the
> entire HEADERS frame into a single string that looks almost identical
> to HTTP/1.1 request headers with the exception of the addition of
> pseudo-headers.  It could then be loosely parsed as HTTP/1.x request
> headers.  An implementation might need to fully parse the HEADERS frame
> before being able to determine that a required pseudo-header is missing.
> If it has already gotten this far parsing the HTTP request, why should
> the RFC disallow the implementation from returning an HTTP error code?

I generally agree with you and it matches what I mentioned earlier that
we should make the effort of reporting at the highest possible level.
You'll often find that when you add H2 to an existing H1 server, your
H2 stack looks almost like an intermediary there, so you could consider
to some extents that it blindly passes what it decoded because it trusts
the server for doing the necessary HTTP checks, and the server which is
in fact the application code will likely produce 400 more often than the
H2 layer will produce RST. But at least framing errors that cannot safely
be recovered from (like the example above) should definitely lead to an
RST (and sometimes even a connection error).

> Aside: I fully agree that if an intermediary is going to rewrite an h2
> request to HTTP/1.x, then it should more strictly validate the h2
> protocol requirements before rewriting the request to HTTP/1.x or else
> there might be ways to slip unexpected characters through the protocol
> translation.

Absolutely, but that's also when you develop intermediaries that you
hear "after I deployed your stuff here my application stopped working",
and when you look closer, you discover dangerous chars in header fields
and stuff like that. Thus the amount of checks has to be... adaptable!

Cheers,
willy
Received on Monday, 25 September 2023 05:26:29 UTC