Re: More on allowed field characters from Roy T. Fielding on 2021-08-26 (ietf-http-wg@w3.org from July to September 2021)

From: Roy T. Fielding <fielding@gbiv.com>
Date: Thu, 26 Aug 2021 11:13:20 -0700
To: Martin Thomson <mt@lowentropy.net>
Cc: ietf-http-wg@w3.org
Message-Id: <AEC4EFA7-4D1C-4E1A-98E1-21D907F9EB31@gbiv.com>
> On Aug 25, 2021, at 10:51 PM, Martin Thomson <mt@lowentropy.net> wrote:
> 
> On Mon, Aug 23, 2021, at 15:03, Martin Thomson wrote:
>> It seems like the allowed characters in fields is a gift that keeps on giving.
> 
> Thanks everyone for all the words you gave.
> 
> Based on feedback from Willy and Greg in particular, I've taken another go at this:
> 
>  https://github.com/httpwg/http2-spec/pull/936/files
> 
> It says that:
> 
> * fields SHOULD be validated properly (according to HTTP §5.1 and §5.5)
> 
> * failure to validate fields might enable attacks, especially if the message ends up in HTTP/1.1 somehow (that is, providing motivation that was lacking from previous iterations on this)
> 
> * if fields aren't fully validated, attacks might happen, so minimal validation MUST be performed (with the checks previously agreed)
> 
> This does not address Roy's original point directly.  Yes, code that makes assumptions without taking responsibility for checking them might be exposed to the full consequences of poor decisions.  However, I believe that a lot of implementations will abide by the SHOULD here.  This is about levying requirements on implementations that might have expected to avoid having to validate fields; because we've learned that copying and pasting without checking happens.
> 
> (I do worry that this is an overreaction.  The original text in the spec was arguably fine.  It was just being ignored.)

I think my issue is still being misunderstood.

I don't think we need every implementation to do field validation on every receipt
or forwarding of HPACK fields. I think we need to be clear on what the requirements
for field validation are when extracting HPACK-encoded strings and using them
in an HTTP context, whether that context be for HTTP fields in other versions,
CGI environment variables, Servlet tables, or an internal data structure for request
processing. The point is that the implementation translating HPACK to an abstract
HTTP message (regardless of version) MUST ensure that the result fits within
all of HTTP's field requirements when that field is used as HTTP, since the
implementation cannot trust the HPACK encoder.

Specifically, coming up with an arbitrary set of different requirements for that
process is NOT GOOD. I understand one desire to do that is because h2 has
the additional restriction of all-lowercase names, but the right way to say that
is as an additional requirement for h2, not by trying (and failing) to subsume
the existing requirements of HTTP.

I think it is fine if this is limited specifically to those recipients actually doing
HTTP things with HTTP fields, as opposed to merely forwarding arbitrary fields
at scale, but it cannot be limited just to HTTP/1. This is an application issue for
anything that consumes an HTTP message. It is still necessary even if an
h2-only server receives a request and directly handles it via an internal
application library. The vulnerability may be inside the application library,
but the h2 server is expected to prevent it because that's where the rubber
meets the road.

Likewise, an HPACK decoder MUST produce valid HTTP field values if they
are going to be used as HTTP field values.

I am fine with HPACK also being used to convey UTF-8 named fields and/or
carrying binary field values, but only when that is clearly indicated via the
protocol and processed as such.

I want the h2 spec to say something like:

   When an implementation extracts a field name string from HPACK and
   intends to use that string as an HTTP field name (semantically), the
   implementation MUST validate the strict intersection of the existing
   MUST requirements on syntax (ref. field-name ABNF [SEMANTICS]
   with uppercase excluded) when extracted from HPACK and used
   with the semantics of an HTTP field name. This is particularly important
   when the received header fields are transformed as a whole from one
   form (i.e., HPACK) to another (e.g., a translated header section, an
   internal hash table, or a set of environment variables).

IOW, don't make up new requirements based on perceived syntax issues.
Refer to the specific syntax required by HTTP and add the h2-specific
limitations, since any variation might result in unforeseen differences
in downstream handling of that message.

....Roy
Received on Thursday, 26 August 2021 18:14:08 UTC