Re: Permitted characters in HTTP/2 fields

Hi Martin,

On Fri, May 21, 2021 at 11:19:35AM +1000, Martin Thomson wrote:
> Hey Willy,
> 
> On Fri, May 21, 2021, at 02:59, Willy Tarreau wrote:
> > I really agree. I don't remember if 0x80 and above are forbidden in H2 but
> > I'd personally prefer to block them so that we don't needlessly introduce
> > the risk of aliasing due to different codings being used. Protocol elements
> > that define how messages should be delimited/routed/etc must be strictly
> > defined and easy to enforce in implementations and applications.
> 
> We never really said before.  I'm happy to extend the 0x7f to 0x7f-0xff if
> that is what others want.  It's not quite the same as limiting the grammar to
> what is permitted for field names, but it might be OK.

I was fine as well with limiting to what is permitted (and I personally use
a bit field to match them) but I agree that using a few ranges is even easier
to implement (especially for small & simple implementations that at least want
to make the effort of staying safe).

> field-name is "!" / "#" / "$" / "%" / "&" / "'" / "*" / "+" / "-" / "." / "^" / "_" / "`" / "|" / "~" / DIGIT / ALPHA
> 
> That amounts to a whole bunch of characters less than %x21-7E (minus ':').  A
> simpler check for c >= 0x21 && c <= 0x7e && c != ':' seems reasonable to me.
> Then we don't have to worry about Unicode field names.  That's not a whole
> lot different than c >= 0x21 && c != 0x7e && c != ':' as the current PR has.

Yep in terms of computation it's the same and can be simplified to:

   (uint8_t)(c - 0x21) <= (0x7e - 0x21) && c != ':'

> I had the distinct impression that we DID see Unicode field names in some cases though.

This is exactly the horror that worries me. I really don't want to open
that pandora box, or we can say byebye to HTTP as a safe transport protocol
for applications. Just imagine a client emitting a POST to an H2-to-H1
gateway, with "content-length: 1000" and "Tran\xd1\x95fer-encoding: chunked",
the latter passing through a string manipulation function which
transliterates it to "Transfer-encoding: chunked" on the other side...
And that's just a single example, there are so many possibilities that
it could almost be funny if we weren't concerned a bit by security,
especially as by experience we know that if it can happen it will happen :-/

> We wanted to avoid backward incompatibility issues that might result from
> tighter constraints on field *values*, which is why we never said anything
> before, but names might be easier.

I have no problem with field values as basically anything besides
control chars and heading/trailing blanks, can already appear there
in previous HTTP versions. In practice the only header fields that
could cause trouble when reading incorrectly matched tokens are
Connection and Transfer-encoding, and both of these are already
forbidden in H2.

Thanks,
Willy

Received on Friday, 21 May 2021 06:17:37 UTC