W3C home > Mailing lists > Public > ietf-http-wg@w3.org > October to December 2011

RE: json-string for HTTP header field parameter values

From: Manger, James H <James.H.Manger@team.telstra.com>
Date: Wed, 2 Nov 2011 13:33:32 +1100
To: Julian Reschke <julian.reschke@gmx.de>
CC: httpbis Group <ietf-http-wg@w3.org>
Message-ID: <255B9BB34FB7D647A506DC292726F6E1129242C46F@WSMSG3153V.srv.dir.telstra.com>
On 2011-10-31, Julian wrote:
> Here's a list of problems I see with this proposal:
>
> - Scope - the proposed syntax overloads quoted-string, potentially 
> changing the interpretation of existing content, thus I don't think we 
> *can* do this as part of HTTPbis.

It is inconceivable that "\u" appears in any existing quoted-string value
as a deliberate escape sequence for "u".

> - Adding a different type of quoted-string might make things more 
> confusing; for the RFC5987 encoding it's at least easy to understand 
> when it's in use.

Presumably RFC5987 (or its predecessors) decided it was highly unlikely
that any parameter names in use ended in "*" (though they are valid)
so it could redefine the syntax of values for such names.
I don't think defining \uXXXX as an escape for Unicode
in quoted-string-like values is that much different.


> - the JSON \u format doesn't really use Unicode but UCS-2 code points; 
> which means that senders and receivers will need to understand surrogate 
> pairs; see also <https://tools.ietf.org/html/rfc5137#section-5.1> for 
> context (that proposal adds additional delimiters to avoid the variable 
> length issue)

I would be happy enough with RFC5137's \u'NNNN[NN]' instead of JSON's \uXXXX,
though I don't think that would be an improvement here.


Curiously, RFC5987 disobeys the proposed recommendations for new parameters.
It allows
  foo*=UTF-8''coll%C3%A8gues
but not
  foo*="UTF-8''coll%C3%A8gues"
That might be ok with a parser that understands token, quoted-string, and RFC5987,
but presumably it will cause problems when RFC5987 processing is done after
a "standard httpbis parser" handles the token | quoted-string step.


My ideal recommendation for new headers would be something like:
  parameter = token "=" *( pct-encoded / token-except-pct )
[One name; one escape mechanism; Unicode support; no separators in the value (, ; = space)]
I thought that making the escaping in quoted-string actually useful
by adding \uXXXX would be less change so more acceptable.

--
James Manger
Received on Wednesday, 2 November 2011 02:34:12 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Friday, 27 April 2012 06:51:49 GMT