Re: Can servers generate responses to malformed requests in h2? from Glenn Strauss on 2023-09-25 (ietf-http-wg@w3.org from July to September 2023)

From: Glenn Strauss <gs-lists-ietf-http-wg@gluelogic.com>
Date: Sun, 24 Sep 2023 23:33:03 -0400
To: Martin Thomson <mt@lowentropy.net>
Cc: Lucas Pardue <lucaspardue.24.7@gmail.com>, HTTP Working Group <ietf-http-wg@w3.org>
Message-ID: <ZRD/b4D/omiIYxek@xps13>

On Mon, Sep 25, 2023 at 12:30:08PM +1000, Martin Thomson wrote:
> On Mon, Sep 25, 2023, at 11:25, Lucas Pardue wrote:
> > For the content-length matter I mentioned above, my implementation 
> > generates a 4xx in H2 and H3. I don't plan to change that.
> 
> Content-Length has always been a bit of an awkward fit in the specification.  Maybe you are right to say that you want to signal that error at the HTTP layer, but would you say the same if you received a SETTINGS frame on a request stream or one of the many other errors specified in the RFC?
> 
> It would be best if these sorts of divergences from what we specified were documented in the specification, don't you think?

I am all for clarification.  However, I ask that we please avoid
overspecification of implementation unless there is both really good
reason and confidence that the specified implementation is the
one-right-implementation.  If not, and there are security concerns
with possible implementer mistakes, the RFC should instead call those
out in Security Considerations to highlight concerns for implementers.

Speaking as an implementer of the specification, I was able to reuse
much of lighttpd's HTTP header parsing and security policy code for
both HTTP/1.x and HTTP/2.

Once end of headers is read in HTTP/1.1 ("\r\n\r\n") and once HEADERS
(and any CONTINUATION frames) have been received for HTTP/2, much of
the lighttpd request header parsing code is reused to parse headers
into internal data structures.

For HTTP/1.1, the request headers string is broken on "\r\n".
For HTTP/2, each element of HEADERS is HPACK-decoded.

Then, for each header, lighttpd applies a number of configurable
security policies.  For HTTP/2, there are additional checks for required
pseudo-headers, that pseudo-headers precede non-pseudo-headers, and that
header field names are all lowercase.  For any failures during initial
HTTP request header parsing, lighttpd issues an HTTP error code,
often HTTP error code 400 Bad Request.

My understanding is that I could treat all HTTP request header errors
received as HTTP/2 requests as h2 stream error PROTOCOL_ERROR.  If that
is what is desired by the RFC authors, please issue an update or errata
and I'll change that code in lighttpd.

However, I currently am unable to understand why h2 PROTOCOL_ERROR
(or H3_GENERAL_PROTOCOL_ERROR) is somehow better than 400 Bad Request.

Here is another reason why I think an HTTP error code may be preferable:
An HTTP/1.0 client might make a request to a proxy which makes an HTTP/2
or HTTP/3 request.  400 Bad Request is an application level code which
should be transmitted all the way back to the client.  An h2 stream
error PROTOCOL_ERROR sent to an intermediary might not make it back to
the HTTP/1.0 client in a form that conveys the error clearly to the
end-user.  (One may argue that the intermediary should have detected
the malformed request, but the origin server might implement a stricter
security policy and is permitted to return PROTOCOL_ERROR.)

I can see where h2 specification around pseudo-headers is specific to
the h2 protocol.  However, an implementation "could" HPACK-decode the
entire HEADERS frame into a single string that looks almost identical
to HTTP/1.1 request headers with the exception of the addition of
pseudo-headers.  It could then be loosely parsed as HTTP/1.x request
headers.  An implementation might need to fully parse the HEADERS frame
before being able to determine that a required pseudo-header is missing.
If it has already gotten this far parsing the HTTP request, why should
the RFC disallow the implementation from returning an HTTP error code?

Aside: I fully agree that if an intermediary is going to rewrite an h2
request to HTTP/1.x, then it should more strictly validate the h2
protocol requirements before rewriting the request to HTTP/1.x or else
there might be ways to slip unexpected characters through the protocol
translation.

Cheers, Glenn

Received on Monday, 25 September 2023 03:33:21 UTC