- From: Roy T. Fielding <fielding@gbiv.com>
- Date: Thu, 26 Aug 2021 11:13:20 -0700
- To: Martin Thomson <mt@lowentropy.net>
- Cc: ietf-http-wg@w3.org
> On Aug 25, 2021, at 10:51 PM, Martin Thomson <mt@lowentropy.net> wrote: > > On Mon, Aug 23, 2021, at 15:03, Martin Thomson wrote: >> It seems like the allowed characters in fields is a gift that keeps on giving. > > Thanks everyone for all the words you gave. > > Based on feedback from Willy and Greg in particular, I've taken another go at this: > > https://github.com/httpwg/http2-spec/pull/936/files > > It says that: > > * fields SHOULD be validated properly (according to HTTP §5.1 and §5.5) > > * failure to validate fields might enable attacks, especially if the message ends up in HTTP/1.1 somehow (that is, providing motivation that was lacking from previous iterations on this) > > * if fields aren't fully validated, attacks might happen, so minimal validation MUST be performed (with the checks previously agreed) > > This does not address Roy's original point directly. Yes, code that makes assumptions without taking responsibility for checking them might be exposed to the full consequences of poor decisions. However, I believe that a lot of implementations will abide by the SHOULD here. This is about levying requirements on implementations that might have expected to avoid having to validate fields; because we've learned that copying and pasting without checking happens. > > (I do worry that this is an overreaction. The original text in the spec was arguably fine. It was just being ignored.) I think my issue is still being misunderstood. I don't think we need every implementation to do field validation on every receipt or forwarding of HPACK fields. I think we need to be clear on what the requirements for field validation are when extracting HPACK-encoded strings and using them in an HTTP context, whether that context be for HTTP fields in other versions, CGI environment variables, Servlet tables, or an internal data structure for request processing. The point is that the implementation translating HPACK to an abstract HTTP message (regardless of version) MUST ensure that the result fits within all of HTTP's field requirements when that field is used as HTTP, since the implementation cannot trust the HPACK encoder. Specifically, coming up with an arbitrary set of different requirements for that process is NOT GOOD. I understand one desire to do that is because h2 has the additional restriction of all-lowercase names, but the right way to say that is as an additional requirement for h2, not by trying (and failing) to subsume the existing requirements of HTTP. I think it is fine if this is limited specifically to those recipients actually doing HTTP things with HTTP fields, as opposed to merely forwarding arbitrary fields at scale, but it cannot be limited just to HTTP/1. This is an application issue for anything that consumes an HTTP message. It is still necessary even if an h2-only server receives a request and directly handles it via an internal application library. The vulnerability may be inside the application library, but the h2 server is expected to prevent it because that's where the rubber meets the road. Likewise, an HPACK decoder MUST produce valid HTTP field values if they are going to be used as HTTP field values. I am fine with HPACK also being used to convey UTF-8 named fields and/or carrying binary field values, but only when that is clearly indicated via the protocol and processed as such. I want the h2 spec to say something like: When an implementation extracts a field name string from HPACK and intends to use that string as an HTTP field name (semantically), the implementation MUST validate the strict intersection of the existing MUST requirements on syntax (ref. field-name ABNF [SEMANTICS] with uppercase excluded) when extracted from HPACK and used with the semantics of an HTTP field name. This is particularly important when the received header fields are transformed as a whole from one form (i.e., HPACK) to another (e.g., a translated header section, an internal hash table, or a set of environment variables). IOW, don't make up new requirements based on perceived syntax issues. Refer to the specific syntax required by HTTP and add the h2-specific limitations, since any variation might result in unforeseen differences in downstream handling of that message. ....Roy
Received on Thursday, 26 August 2021 18:14:08 UTC