- From: Julian Reschke <julian.reschke@gmx.de>
- Date: Sat, 23 Jul 2016 21:17:54 +0200
- To: HTTP Working Group <ietf-http-wg@w3.org>
Hi there,
the recent discussions right here on the mailing list, and also in
Berlin, were very interesting.
On the way back from Berlin it occurred to me that we may want to
challenge the assumptions that led to RFC 2231 long ago, and to RFC 5987
later on.
RFC 2231 essentially overloads the parameter name syntax (trailing "*"),
and then introduces a specific micro syntax for tokens (ext-value, see
<https://greenbytes.de/tech/webdav/draft-ietf-httpbis-rfc5987bis-02.html#rfc.section.3.2.1.p.2>).
All of this happened for MIME, and the assumption was that we can't
change the generic
name "=" token / quoted-string
pattern.
I believe the assumption was that this would magically work for many
(all?) MIME header fields. I don't believe it's true for mail; and it
certainly is not true for HTTP: in HTTP, header field definitions in
practice need to opt-in to this extension, and only few have
(Content-Disposition, Link, Digest auth).
So given the fact that we don't change existing fields, but just define
something field definitions can opt-in to use, we could try something
else. To allow non-ASCII characters in parameter values (without
touching the name), there seem to be a few alternatives:
1) Just define that quoted-string carries UTF-8
or
2) Define a variant of quoted-string that can carry non-ASCII characters
in escaped form.
Option 1) has the drawback that there are many many APIs that assume
field values are ISO-8859-1, and depending on the language, it might be
hard to get back to an octet sequence and re-parse as UTF-8. Also, at
least for Content-Disposition it would be hard to deploy due to existing
code.
For option 2) there are again several ways to do it:
2a) Given the fact that RFC 7230 already says "SHOULD NOT escape when
not needed"
(<https://greenbytes.de/tech/webdav/rfc7230.html#rfc.section.3.2.6.p.5>),
we could make the assumption that we could actually introduce new escape
sequences with little breakage, such as a JSON-ish "\uddddd" for
non-ASCII characters.
2b) Alternatively, we could use something like
name "=" token / quoted-string / new-quoted-string
where new-quoted-string can be distinguished from any valid token or
quoted-string, and which would carry the escape sequence format
mentioned in 2a). RFC 7230 says we could use one of the characters in
(),/:;<=>?@[\]{}
for that. Such as
pile-of-poo=<\uD83D\uDCA9> (PS2)
Best regards, Julian
PS: and yes, 2a) was suggested a few years ago, and back then I was
opposed to it, so it's really not my idea.
PS2: and if we get there, I'd actually vote for a syntax that doesn't
rely on surrogate pairs
Received on Saturday, 23 July 2016 19:18:25 UTC