- From: Mark Nottingham <mnot@mnot.net>
- Date: Mon, 31 Oct 2011 14:03:25 +1100
- To: "Manger, James H" <James.H.Manger@team.telstra.com>
- Cc: httpbis Group <ietf-http-wg@w3.org>
Hi James, On 31/10/2011, at 12:07 AM, Manger, James H wrote: > HTTP currently uses token and quoted-string for various header field parameter values, and recommends these syntaxes for new headers. However neither supports Unicode, which isn't really acceptable today. Well, it uses a variety of things. One of our outstanding tasks is to collect a set of header microsyntax that's in common use; e.g., the parameterised string, which is already used widely. > I would like to recommend the JSON string syntax for new header field parameter values. JSON is very widely used on the web, particularly by protocols built on HTTP. There are JSON implementations for basically every computer language. JSON support the full range of Unicode characters. Developers love it. Yes, developers love it; just as they loved XML five years ago. I'm a little wary of following fashion, given the relatively long time scales of HTTP evolution. While I think most will agree that the current header syntax is far from perfect, one of its main problems is that there's a lot of differences between existing header parsing. Adding a new formatting convention will not help that situation. > A JSON string: is enclosed in double quotes; uses \" and \\ to represent " and \; uses six other \x sequences for other chars; and allows \uXXXX as an escape sequence for any Unicode character [json.org, RFC4627]. An HTTP header profile of JSON string would require any chars outside the printable ASCII set to be escaped. Just an observation; this form of Unicode support is an encoding -- just as 5987 is an encoding. You can have aesthetic concerns about one or the other, but they fulfil the same job, and aren't seriously visible to users (who see the decoded artefact) or developers (who use a library, in both cases). > RFC5987 "Character Set and Language Encoding for HTTP Header Field Parameters" already offer one way to represent any Unicode string in a HTTP header parameter value, eg foo*=UTF-8''coll%C3%A8gues. However this is not very appealing when defining a new parameter. HTTPbis-p2 already recommends new parameters allow the token and quoted-string syntaxes so supporting RFC5987 for Unicode means implementations have to support 2 parameter names (foo and foo*), 3 syntaxes, and 2 escaping mechanisms (\x in quoted-string, and %xx in RFC5987) -- all for a brand new parameter. Yuck. New headers (or new parameters to existing ones) don't have to support two parameter names -- if they want to specify that only foo* be present, so be it, AIUI. That puts it to one syntax and one escaping mechanism. > I think the considerations for new headers (issue #231), and advice on defining auth scheme parameters (issue #320), should consider how to support Unicode parameter values -- and json-string would be a good way to do that. Considering that there's effectively zero experience with creating JSON-based HTTP headers, it's something we're not likely to do in HTTPbis. However, nothing's stopping you from defining a header with JSON syntax; 231 is trying to encourage people to think about interoperability, leveraging existing libraries, convergence, etc. when they create a header. There are some things to consider if you do use JSON; e.g. Foo: { "a": 1, Foo: "b", 2 } is dangerous, because it can be collapsed by an intermediary to: Foo: {"a": 1,, "b", 2} ... which is invalid JSON. That means that authors would have to be very careful to do: Foo: {"a": 1 Foo: "b", 2} if they want to use multiple lines (which is somewhat likely, given that implementations often have length limits for HTTP headers). Cheers, -- Mark Nottingham http://www.mnot.net/
Received on Monday, 31 October 2011 03:03:57 UTC