- From: Willy Tarreau <w@1wt.eu>
- Date: Fri, 21 May 2021 08:17:17 +0200
- To: Martin Thomson <mt@lowentropy.net>
- Cc: ietf-http-wg@w3.org
Hi Martin, On Fri, May 21, 2021 at 11:19:35AM +1000, Martin Thomson wrote: > Hey Willy, > > On Fri, May 21, 2021, at 02:59, Willy Tarreau wrote: > > I really agree. I don't remember if 0x80 and above are forbidden in H2 but > > I'd personally prefer to block them so that we don't needlessly introduce > > the risk of aliasing due to different codings being used. Protocol elements > > that define how messages should be delimited/routed/etc must be strictly > > defined and easy to enforce in implementations and applications. > > We never really said before. I'm happy to extend the 0x7f to 0x7f-0xff if > that is what others want. It's not quite the same as limiting the grammar to > what is permitted for field names, but it might be OK. I was fine as well with limiting to what is permitted (and I personally use a bit field to match them) but I agree that using a few ranges is even easier to implement (especially for small & simple implementations that at least want to make the effort of staying safe). > field-name is "!" / "#" / "$" / "%" / "&" / "'" / "*" / "+" / "-" / "." / "^" / "_" / "`" / "|" / "~" / DIGIT / ALPHA > > That amounts to a whole bunch of characters less than %x21-7E (minus ':'). A > simpler check for c >= 0x21 && c <= 0x7e && c != ':' seems reasonable to me. > Then we don't have to worry about Unicode field names. That's not a whole > lot different than c >= 0x21 && c != 0x7e && c != ':' as the current PR has. Yep in terms of computation it's the same and can be simplified to: (uint8_t)(c - 0x21) <= (0x7e - 0x21) && c != ':' > I had the distinct impression that we DID see Unicode field names in some cases though. This is exactly the horror that worries me. I really don't want to open that pandora box, or we can say byebye to HTTP as a safe transport protocol for applications. Just imagine a client emitting a POST to an H2-to-H1 gateway, with "content-length: 1000" and "Tran\xd1\x95fer-encoding: chunked", the latter passing through a string manipulation function which transliterates it to "Transfer-encoding: chunked" on the other side... And that's just a single example, there are so many possibilities that it could almost be funny if we weren't concerned a bit by security, especially as by experience we know that if it can happen it will happen :-/ > We wanted to avoid backward incompatibility issues that might result from > tighter constraints on field *values*, which is why we never said anything > before, but names might be easier. I have no problem with field values as basically anything besides control chars and heading/trailing blanks, can already appear there in previous HTTP versions. In practice the only header fields that could cause trouble when reading incorrectly matched tokens are Connection and Transfer-encoding, and both of these are already forbidden in H2. Thanks, Willy
Received on Friday, 21 May 2021 06:17:37 UTC