Re: json-string for HTTP header field parameter values

On 2011-11-02 03:33, Manger, James H wrote:
> On 2011-10-31, Julian wrote:
>> Here's a list of problems I see with this proposal:
>>
>> - Scope - the proposed syntax overloads quoted-string, potentially
>> changing the interpretation of existing content, thus I don't think we
>> *can* do this as part of HTTPbis.
>
> It is inconceivable that "\u" appears in any existing quoted-string value
> as a deliberate escape sequence for "u".

"Inconceivable"?

>> - Adding a different type of quoted-string might make things more
>> confusing; for the RFC5987 encoding it's at least easy to understand
>> when it's in use.
>
> Presumably RFC5987 (or its predecessors) decided it was highly unlikely
> that any parameter names in use ended in "*" (though they are valid)
> so it could redefine the syntax of values for such names.

Indeed. That's the kind of compromise people make when they want to cram 
something new into a syntax that didn't have an extension point.

> I don't think defining \uXXXX as an escape for Unicode
> in quoted-string-like values is that much different.

That may be true, but the difference here is that you're proposing to do 
it a second time to solve a problem that is already solved by the first 
change.

>> - the JSON \u format doesn't really use Unicode but UCS-2 code points;
>> which means that senders and receivers will need to understand surrogate
>> pairs; see also<https://tools.ietf.org/html/rfc5137#section-5.1>  for
>> context (that proposal adds additional delimiters to avoid the variable
>> length issue)
>
> I would be happy enough with RFC5137's \u'NNNN[NN]' instead of JSON's \uXXXX,
> though I don't think that would be an improvement here.
>
>
> Curiously, RFC5987 disobeys the proposed recommendations for new parameters.
> It allows
>    foo*=UTF-8''coll%C3%A8gues
> but not
>    foo*="UTF-8''coll%C3%A8gues"

Yes. And, indeed, Firefox got this wrong, but we fixed that for Firefox 
8 (<https://bugzilla.mozilla.org/show_bug.cgi?id=651185>). It means that 
a generic parser for header field parameters needs intrinsic knowledge 
of RFC 5987.

> That might be ok with a parser that understands token, quoted-string, and RFC5987,
> but presumably it will cause problems when RFC5987 processing is done after
> a "standard httpbis parser" handles the token | quoted-string step.

Correct. We have evidence that all major browsers that support RFC 5987 
get this right, though. 
(<http://greenbytes.de/tech/tc2231/#attwithfn2231quot>)

> My ideal recommendation for new headers would be something like:
>    parameter = token "=" *( pct-encoded / token-except-pct )
> [One name; one escape mechanism; Unicode support; no separators in the value (, ; = space)]
> I thought that making the escaping in quoted-string actually useful
> by adding \uXXXX would be less change so more acceptable.

Well, we can't reduce the number of notations by adding more.

In an ideal world, we can just move the quoted-string encoding to UTF-8.

Best regards, Julian

Received on Wednesday, 2 November 2011 07:51:22 UTC