Re: More on allowed field characters from Martin Thomson on 2021-08-27 (ietf-http-wg@w3.org from July to September 2021)

From: Martin Thomson <mt@lowentropy.net>
Date: Fri, 27 Aug 2021 11:21:05 +1000
To: "Roy Fielding" <fielding@gbiv.com>
Cc: ietf-http-wg@w3.org
Message-Id: <3e01386e-8927-4e99-b1cd-c78680bc5fa3@www.fastmail.com>
How is this not a core HTTP problem?

Part of the intent of the change is to leave the validation to core specs.  If you are using h2 fields as HTTP fields, the rules for HTTP already cover that use.

There are two extra things we do in h2:

1. deny uppercase field names
2. mandate extra checks from an abundance of caution

(The latter might be mere paranoia, but it's demonstrably necessary.)

On Fri, Aug 27, 2021, at 04:13, Roy T. Fielding wrote:
> > On Aug 25, 2021, at 10:51 PM, Martin Thomson <mt@lowentropy.net> wrote:
> > 
> > On Mon, Aug 23, 2021, at 15:03, Martin Thomson wrote:
> >> It seems like the allowed characters in fields is a gift that keeps on giving.
> > 
> > Thanks everyone for all the words you gave.
> > 
> > Based on feedback from Willy and Greg in particular, I've taken another go at this:
> > 
> >  https://github.com/httpwg/http2-spec/pull/936/files
> > 
> > It says that:
> > 
> > * fields SHOULD be validated properly (according to HTTP §5.1 and §5.5)
> > 
> > * failure to validate fields might enable attacks, especially if the message ends up in HTTP/1.1 somehow (that is, providing motivation that was lacking from previous iterations on this)
> > 
> > * if fields aren't fully validated, attacks might happen, so minimal validation MUST be performed (with the checks previously agreed)
> > 
> > This does not address Roy's original point directly.  Yes, code that makes assumptions without taking responsibility for checking them might be exposed to the full consequences of poor decisions.  However, I believe that a lot of implementations will abide by the SHOULD here.  This is about levying requirements on implementations that might have expected to avoid having to validate fields; because we've learned that copying and pasting without checking happens.
> > 
> > (I do worry that this is an overreaction.  The original text in the spec was arguably fine.  It was just being ignored.)
> 
> I think my issue is still being misunderstood.
> 
> I don't think we need every implementation to do field validation on 
> every receipt
> or forwarding of HPACK fields. I think we need to be clear on what the 
> requirements
> for field validation are when extracting HPACK-encoded strings and 
> using them
> in an HTTP context, whether that context be for HTTP fields in other 
> versions,
> CGI environment variables, Servlet tables, or an internal data 
> structure for request
> processing. The point is that the implementation translating HPACK to 
> an abstract
> HTTP message (regardless of version) MUST ensure that the result fits 
> within
> all of HTTP's field requirements when that field is used as HTTP, since 
> the
> implementation cannot trust the HPACK encoder.
> 
> Specifically, coming up with an arbitrary set of different requirements for that
> process is NOT GOOD. I understand one desire to do that is because h2 has
> the additional restriction of all-lowercase names, but the right way to say that
> is as an additional requirement for h2, not by trying (and failing) to subsume
> the existing requirements of HTTP.
> 
> I think it is fine if this is limited specifically to those recipients 
> actually doing
> HTTP things with HTTP fields, as opposed to merely forwarding arbitrary 
> fields
> at scale, but it cannot be limited just to HTTP/1. This is an 
> application issue for
> anything that consumes an HTTP message. It is still necessary even if an
> h2-only server receives a request and directly handles it via an 
> internal
> application library. The vulnerability may be inside the application 
> library,
> but the h2 server is expected to prevent it because that's where the 
> rubber
> meets the road.
> 
> Likewise, an HPACK decoder MUST produce valid HTTP field values if they
> are going to be used as HTTP field values.
> 
> I am fine with HPACK also being used to convey UTF-8 named fields and/or
> carrying binary field values, but only when that is clearly indicated via the
> protocol and processed as such.
> 
> I want the h2 spec to say something like:
> 
>    When an implementation extracts a field name string from HPACK and
>    intends to use that string as an HTTP field name (semantically), the
>    implementation MUST validate the strict intersection of the existing
>    MUST requirements on syntax (ref. field-name ABNF [SEMANTICS]
>    with uppercase excluded) when extracted from HPACK and used
>    with the semantics of an HTTP field name. This is particularly important
>    when the received header fields are transformed as a whole from one
>    form (i.e., HPACK) to another (e.g., a translated header section, an
>    internal hash table, or a set of environment variables).
> 
> IOW, don't make up new requirements based on perceived syntax issues.
> Refer to the specific syntax required by HTTP and add the h2-specific
> limitations, since any variation might result in unforeseen differences
> in downstream handling of that message.
> 
> ....Roy
> 
>
Received on Friday, 27 August 2021 01:21:37 UTC