rfc5987bis, jfv, parameters

Hi there,

the recent discussions right here on the mailing list, and also in 
Berlin, were very interesting.

On the way back from Berlin it occurred to me that we may want to 
challenge the assumptions that led to RFC 2231 long ago, and to RFC 5987 
later on.

RFC 2231 essentially overloads the parameter name syntax (trailing "*"), 
and then introduces a specific micro syntax for tokens (ext-value, see 
<https://greenbytes.de/tech/webdav/draft-ietf-httpbis-rfc5987bis-02.html#rfc.section.3.2.1.p.2>).

All of this happened for MIME, and the assumption was that we can't 
change the generic

   name "=" token / quoted-string

pattern.

I believe the assumption was that this would magically work for many 
(all?) MIME header fields. I don't believe it's true for mail; and it 
certainly is not true for HTTP: in HTTP, header field definitions in 
practice need to opt-in to this extension, and only few have 
(Content-Disposition, Link, Digest auth).

So given the fact that we don't change existing fields, but just define 
something field definitions can opt-in to use, we could try something 
else. To allow non-ASCII characters in parameter values (without 
touching the name), there seem to be a few alternatives:

1) Just define that quoted-string carries UTF-8

or

2) Define a variant of quoted-string that can carry non-ASCII characters 
in escaped form.

Option 1) has the drawback that there are many many APIs that assume 
field values are ISO-8859-1, and depending on the language, it might be 
hard to get back to an octet sequence and re-parse as UTF-8. Also, at 
least for Content-Disposition it would be hard to deploy due to existing 
code.

For option 2) there are again several ways to do it:

2a) Given the fact that RFC 7230 already says "SHOULD NOT escape when 
not needed" 
(<https://greenbytes.de/tech/webdav/rfc7230.html#rfc.section.3.2.6.p.5>), 
we could make the assumption that we could actually introduce new escape 
sequences with little breakage, such as a JSON-ish "\uddddd" for 
non-ASCII characters.

2b) Alternatively, we could use something like

   name "=" token / quoted-string / new-quoted-string

where new-quoted-string can be distinguished from any valid token or 
quoted-string, and which would carry the escape sequence format 
mentioned in 2a). RFC 7230 says we could use one of the characters in

   (),/:;<=>?@[\]{}

for that. Such as

   pile-of-poo=<\uD83D\uDCA9> (PS2)

Best regards, Julian

PS: and yes, 2a) was suggested a few years ago, and back then I was 
opposed to it, so it's really not my idea.

PS2: and if we get there, I'd actually vote for a syntax that doesn't 
rely on surrogate pairs

Received on Saturday, 23 July 2016 19:18:25 UTC