Re: Permitted characters in HTTP/2 fields from Cory Benfield on 2021-05-24 (ietf-http-wg@w3.org from April to June 2021)

From: Cory Benfield <cory@lukasa.co.uk>
Date: Mon, 24 May 2021 15:27:43 +0100
To: Greg Wilkins <gregw@webtide.com>
Cc: HTTP Working Group <ietf-http-wg@w3.org>
Message-ID: <CAH_hAJHjLxVERDAyTMu6=9KAS+xzAUKGO+K3qEc5Eda1kUhTmw@mail.gmail.com>
On Fri, 21 May 2021 at 03:37, Greg Wilkins <gregw@webtide.com> wrote:
>
>
> This conversation started with:
>
>> At our last interim, we discussed potential ways in which HTTP/2 was probably too strict about characters (octets really) in field names and values.
>> The conclusion then was to loosen the restriction and mandate only a small set of checks.  This should match what implementations already do.
>
>
> Any chance of describing exactly what those reasons are, because it's lost on me exactly what problem is being solved here.      If we don't have a full brief for these changes, then how are we meant to evaluate them or indeed record the reason for posterity.  Neither #815 nor #846 explain the problem other than say the text is confusing.  There is no motivation for why validation should be less than carrying HTTP fields plus pseudo fields.
>
> I don't mind the current text so much, as it says I can validate against HTTP semantic fields as defined by https://www.ietf.org/archive/id/draft-ietf-httpbis-semantics-15.html#section-5, so I will.   I'm just going to reject any other fields and I'm allowed to, so I'm happy.    But I have no idea why we want to allow implementations to send non compliant fields around.   Isn't that just asking for problems.   If it is because some existing implementations are already sending invalid fields, then they are doing so regardless and unless you say an impl must accept them, then any impl may reject them as invalid. So changing the spec to be less strict makes no difference so long as impls are allowed to actually enforce correct validation.

Not all implementations are equal. It is not necessarily reasonable to
bind intermediaries to parse the ABNF of the fields they pass around.
Indeed, many do not. However, these proposed rules do bind
intermediaries: they are supposed to enforce the additional rules put
in place by this proposal.

It's worth remembering how semantics actually describes ABNF as
working. Quoting from -semantics §2.2:

> A sender MUST NOT generate protocol elements that do not match the grammar defined by the corresponding ABNF rules.

This rule binds the _sender_, not the receiver. It leaves unspecified
whether a receiver is obligated to error on a mismatch of the grammar.
This is (as far as I know), deliberate: receivers are entitled to
tolerate violations of the ABNF if they feel they can do so, except
where other relevant specifications forbid it.

That's what these rules are for: they are over-and-above the ABNF, and
bind recipients, including intermediaries. The text on line 3750 of
Martin's proposal covers this pretty clearly I think:

> An intermediary can reject fields that contain invalid field names or values for other
> reasons, in particular those that do not conform to the HTTP ABNF grammar from <xref
> target="HTTP" section="5"/>. Intermediaries that do not perform any validation of fields
> other than the minimum required by <xref target="HttpHeaders"/> could forward messages
> that contain invalid field names or values.

>
> Finally, when the "Brief" says we should match what implementations already do, then the question is which implementations are to be matched?   If there are some implementations that already enforce the precise spec for HTTP headers, then should we match those imples or are some implementations more match worthy than others?
>
>
>
> On Fri, 21 May 2021 at 11:22, Martin Thomson <mt@lowentropy.net> wrote:
>>
>> Hey Willy,
>>
>> On Fri, May 21, 2021, at 02:59, Willy Tarreau wrote:
>> > I really agree. I don't remember if 0x80 and above are forbidden in H2 but
>> > I'd personally prefer to block them so that we don't needlessly introduce
>> > the risk of aliasing due to different codings being used. Protocol elements
>> > that define how messages should be delimited/routed/etc must be strictly
>> > defined and easy to enforce in implementations and applications.
>>
>> We never really said before.  I'm happy to extend the 0x7f to 0x7f-0xff if that is what others want.  It's not quite the same as limiting the grammar to what is permitted for field names, but it might be OK.
>>
>> field-name is "!" / "#" / "$" / "%" / "&" / "'" / "*" / "+" / "-" / "." / "^" / "_" / "`" / "|" / "~" / DIGIT / ALPHA
>>
>> That amounts to a whole bunch of characters less than %x21-7E (minus ':').  A simpler check for c >= 0x21 && c <= 0x7e && c != ':' seems reasonable to me.  Then we don't have to worry about Unicode field names.  That's not a whole lot different than c >= 0x21 && c != 0x7e && c != ':' as the current PR has.
>>
>> I had the distinct impression that we DID see Unicode field names in some cases though.
>>
>> We wanted to avoid backward incompatibility issues that might result from tighter constraints on field *values*, which is why we never said anything before, but names might be easier.
>>
>
>
> --
> Greg Wilkins <gregw@webtide.com> CTO http://webtide.com
Received on Monday, 24 May 2021 14:28:08 UTC