- From: Julian Reschke <julian.reschke@gmx.de>
- Date: Wed, 02 Nov 2011 08:50:39 +0100
- To: "Manger, James H" <James.H.Manger@team.telstra.com>
- CC: httpbis Group <ietf-http-wg@w3.org>
On 2011-11-02 03:33, Manger, James H wrote: > On 2011-10-31, Julian wrote: >> Here's a list of problems I see with this proposal: >> >> - Scope - the proposed syntax overloads quoted-string, potentially >> changing the interpretation of existing content, thus I don't think we >> *can* do this as part of HTTPbis. > > It is inconceivable that "\u" appears in any existing quoted-string value > as a deliberate escape sequence for "u". "Inconceivable"? >> - Adding a different type of quoted-string might make things more >> confusing; for the RFC5987 encoding it's at least easy to understand >> when it's in use. > > Presumably RFC5987 (or its predecessors) decided it was highly unlikely > that any parameter names in use ended in "*" (though they are valid) > so it could redefine the syntax of values for such names. Indeed. That's the kind of compromise people make when they want to cram something new into a syntax that didn't have an extension point. > I don't think defining \uXXXX as an escape for Unicode > in quoted-string-like values is that much different. That may be true, but the difference here is that you're proposing to do it a second time to solve a problem that is already solved by the first change. >> - the JSON \u format doesn't really use Unicode but UCS-2 code points; >> which means that senders and receivers will need to understand surrogate >> pairs; see also<https://tools.ietf.org/html/rfc5137#section-5.1> for >> context (that proposal adds additional delimiters to avoid the variable >> length issue) > > I would be happy enough with RFC5137's \u'NNNN[NN]' instead of JSON's \uXXXX, > though I don't think that would be an improvement here. > > > Curiously, RFC5987 disobeys the proposed recommendations for new parameters. > It allows > foo*=UTF-8''coll%C3%A8gues > but not > foo*="UTF-8''coll%C3%A8gues" Yes. And, indeed, Firefox got this wrong, but we fixed that for Firefox 8 (<https://bugzilla.mozilla.org/show_bug.cgi?id=651185>). It means that a generic parser for header field parameters needs intrinsic knowledge of RFC 5987. > That might be ok with a parser that understands token, quoted-string, and RFC5987, > but presumably it will cause problems when RFC5987 processing is done after > a "standard httpbis parser" handles the token | quoted-string step. Correct. We have evidence that all major browsers that support RFC 5987 get this right, though. (<http://greenbytes.de/tech/tc2231/#attwithfn2231quot>) > My ideal recommendation for new headers would be something like: > parameter = token "=" *( pct-encoded / token-except-pct ) > [One name; one escape mechanism; Unicode support; no separators in the value (, ; = space)] > I thought that making the escaping in quoted-string actually useful > by adding \uXXXX would be less change so more acceptable. Well, we can't reduce the number of notations by adding more. In an ideal world, we can just move the quoted-string encoding to UTF-8. Best regards, Julian
Received on Wednesday, 2 November 2011 07:51:22 UTC