- From: Poul-Henning Kamp <phk@phk.freebsd.dk>
- Date: Sat, 15 Oct 2016 09:41:07 +0000
- To: Matt Menke <mmenke@google.com>, ietf-http-wg@w3.org
-------- In message <CAEK7mvoXqyX3cADJytjU+C158EULgPLbzAb5kiUN=8WWxhi29Q@mail.gmail.com> , Matt Menke writes: >I think the draft looks good, but have a couple comments: > >The token rule in RFC7230 already includes asterisks, so I don't think = >identifier or token_or_asterix is needed. Yes, I just fixed that. >Would it make sense to codify behavior if a part of a >h1_common_structure value fails to parse, at least if it uses the >proposed "><" format)? I suspect what browsers do is inconsistent here, >and having some official rule (ignore the entire element vs ignore the >entire line vs ignore the broken parameter) seems like it would be worth >having? I'd go with throw away the entire header line, if it uses the >new format and that happens, since that's easiest to standardize on. So this is a bit of a sticky wicket. Today that is a per-header decision, for instance Accept-Encoding can safely ignore anything it doesn't understand/parse, whereas Content-Encoding has to be parsed perfect. It is also a soft spot which has been used in a number of creative attacks on deeper layer in HTTP/1 sandwiches. Looking forward, if we want to be able to use CS to build H3 compression, we cannot allow CS headers with format errors. I'm uncomfortable with a rule which says "just ignore", so I would propose that failure to parse a the CS level should cause a 4xx error, just like an ascii BEL in a HTTP1 header would. But please note that this is only at the CS level, how valid CS which is semantically invalid (ie: "Content-Length: ABCD") should be handled is outside the scope of this ID. I'm not even sure we can give a meaningful "default" rule. >I think it's unfortunate that the HTTP/1 serialization can't distinguish >between identifiers, numbers, and timestamps. Yes, but we don't really get to decide where we start. My hope is that we can build a machine-readable specification language for HTTP headers from which the "semantic parsing" code can be generated, but that is clearly in the "future work" column. >It means that >per-specific-header logic will have to be responsible for that extra >round of parsing for HTTP/1 headers. Not necessarily. Parsing CS in HTTP/1 serialization is very trivial and it is not obvious to me that it always would or should be a separate step. With a specification language as mentioned above, you probably would generate combined CS+semantic parser code. The big advantage of CS is that we don't need to know the semantics. If your implementation receives "My-Private-Header: >[...]<" it can take it apart and present it as a native datastructure, and the application logic can apply the privatly known semantics to that. -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence.
Received on Saturday, 15 October 2016 09:41:33 UTC