Re: More on allowed field characters from Willy Tarreau on 2021-08-23 (ietf-http-wg@w3.org from July to September 2021)

From: Willy Tarreau <w@1wt.eu>
Date: Mon, 23 Aug 2021 07:44:14 +0200
To: Martin Thomson <mt@lowentropy.net>
Cc: ietf-http-wg@w3.org, Roy Fielding <fielding@gbiv.com>
Message-ID: <20210823054414.GA14492@1wt.eu>
Hi Martin,

On Mon, Aug 23, 2021 at 03:03:09PM +1000, Martin Thomson wrote:
> Hey all,
> 
> It seems like the allowed characters in fields is a gift that keeps on giving.
> 
> Roy opened https://github.com/httpwg/http2-spec/issues/902 asking about
> DQUOTE and "(),/:;<=>?@[]{}".
> 
> The text is here:
> https://httpwg.org/http2-spec/draft-ietf-httpbis-http2bis.html#name-field-validity
> 
> When we made changes for field validation, our intent was not to override
> requirements in core semantics, but to specify just the bare minimum for
> interoperability and security.  For interoperability we dropped uppercase
> field names.  For security, the focus was on request smuggling.  So we block
> NUL, CR, LF, and COLON, but not a whole lot more.
> 
> We ended up saying the following about general validity:
> 
> > A recipient MAY treat a message that contains a field name or value that
> > includes other characters disallowed by Section 5.1 of [HTTP] and Section
> > 5.5 of [HTTP] as malformed (Section 8.1.1).
> 
> That is, requests that contain DQUOTE and friends are still invalid, but we
> don't require that HTTP/2 implementations specifically look for those octets
> and treat those messages as malformed.
> 
> Roy points out that there are connected systems (like CGI, which communicates
> using environment variables; yes in 2021) that depend on field names not
> containing the above characters.  I believe that those systems are still
> protected by the rules in -semantics.  That is, after all, the most
> appropriate place for requirements of that nature.

One concern that I do have with leaving them allowed in H2 implementations
is that it *will* create interoperability issues between H2 implementations.
Those who implement both the client and the server will put whatever in that
and call that H2, and the day an intermediary arrives in the middle, this
will break, forcing the intermediary to relax its rules and enter a new
class of problems when it has to deal with other versions.

Another example I mentioned previously was chars > 0x7F. This could let
utf-8 chars be used and then it will become a security issue, because
you can be certain that some characters will be transcoded along the
chain and that some header fields could be aliased depending on how
implementations deal with charsets, or worse, well-known names such as
Connection, Host or Transfer-Encoding could be produced as the result
of the transcoding of homoglyphs.

With this said, I think I understand the point you're trying to address.
HPACK is designed to be completely generic and is binary transparent, so
that in a layered design it is not easy to filter invalid characters in
the HPACK layer (just in case it comes from a generic lib that could be
used elsewhere). In addition to this, the fact that HPACK works as a
dictionary is a real pain because it forces to check everything at the
output, since the bad characters that are inserted into it could be
reused for a subsequent request.

And doing so at the semantics layer could be too late as well (e.g. the
recent report about H2 issues with :method, :authority, :path that have
to be validated before being assembled for the semantics layer).

> The point of the text in HTTP/2 was to strengthen requirements.  I've
> reordered the text in https://github.com/httpwg/http2-spec/pull/936 and I
> think that is sufficient for this case.

I think that the risk remains opened that the spec is quickly overlooked
and that those who consider they don't need to enforce semantics will
make use of those invalid chars. What about adding this before the
paragraph you moved:

  Even though HPACK is capable of carrying field names or values that
  are not valid in HTTP, HTTP/2 implementations MUST NOT emit names
  nor values that include characters disallowed by [HTTP].

This will be sufficient to declare any such sender non-compliant and
then it makes sense to go on with your paragraph suggesting that a
recipient may reject this one as malformed.

Regards,
Willy
Received on Monday, 23 August 2021 05:44:39 UTC