- From: Julian Reschke <julian.reschke@gmx.de>
- Date: Sat, 23 Jul 2016 21:17:54 +0200
- To: HTTP Working Group <ietf-http-wg@w3.org>
Hi there, the recent discussions right here on the mailing list, and also in Berlin, were very interesting. On the way back from Berlin it occurred to me that we may want to challenge the assumptions that led to RFC 2231 long ago, and to RFC 5987 later on. RFC 2231 essentially overloads the parameter name syntax (trailing "*"), and then introduces a specific micro syntax for tokens (ext-value, see <https://greenbytes.de/tech/webdav/draft-ietf-httpbis-rfc5987bis-02.html#rfc.section.3.2.1.p.2>). All of this happened for MIME, and the assumption was that we can't change the generic name "=" token / quoted-string pattern. I believe the assumption was that this would magically work for many (all?) MIME header fields. I don't believe it's true for mail; and it certainly is not true for HTTP: in HTTP, header field definitions in practice need to opt-in to this extension, and only few have (Content-Disposition, Link, Digest auth). So given the fact that we don't change existing fields, but just define something field definitions can opt-in to use, we could try something else. To allow non-ASCII characters in parameter values (without touching the name), there seem to be a few alternatives: 1) Just define that quoted-string carries UTF-8 or 2) Define a variant of quoted-string that can carry non-ASCII characters in escaped form. Option 1) has the drawback that there are many many APIs that assume field values are ISO-8859-1, and depending on the language, it might be hard to get back to an octet sequence and re-parse as UTF-8. Also, at least for Content-Disposition it would be hard to deploy due to existing code. For option 2) there are again several ways to do it: 2a) Given the fact that RFC 7230 already says "SHOULD NOT escape when not needed" (<https://greenbytes.de/tech/webdav/rfc7230.html#rfc.section.3.2.6.p.5>), we could make the assumption that we could actually introduce new escape sequences with little breakage, such as a JSON-ish "\uddddd" for non-ASCII characters. 2b) Alternatively, we could use something like name "=" token / quoted-string / new-quoted-string where new-quoted-string can be distinguished from any valid token or quoted-string, and which would carry the escape sequence format mentioned in 2a). RFC 7230 says we could use one of the characters in (),/:;<=>?@[\]{} for that. Such as pile-of-poo=<\uD83D\uDCA9> (PS2) Best regards, Julian PS: and yes, 2a) was suggested a few years ago, and back then I was opposed to it, so it's really not my idea. PS2: and if we get there, I'd actually vote for a syntax that doesn't rely on surrogate pairs
Received on Saturday, 23 July 2016 19:18:25 UTC