- From: Mark Nottingham <mnot@mnot.net>
- Date: Wed, 26 Mar 2008 12:01:52 +1100
- To: Roy T. Fielding <fielding@gbiv.com>
- Cc: HTTP Working Group <ietf-http-wg@w3.org>
On 26/03/2008, at 11:40 AM, Roy T. Fielding wrote: >> A secondary issue is what encoding should be used in those cases >> were it is reasonable to allow it. I'm not sure what the value of >> requiring that it be the same everywhere is; some payloads (e.g., >> IRIs, e-mail addresses) have well-defined "natural" encodings into >> ASCII that are more appropriate. > > Unless we are going to change the protocol, the answer to that > question > is ISO-8859-1 or RFC2047. If we are going to change the protocol, > then > the answer would be raw UTF-8 (HTTP doesn't care about the content of > TEXT as long as the encoding is a superset of ASCII, so the only > compatibility issue here is understanding the intent of the sender). What do you mean by ISO-8859-1 *or* RFC2047 here? Even if RFC2047 encoding is in effect, the actual character set in use is a subset of ISO-8859-1; no characters outside of that are actually on the wire, it's just an encoding of them into ASCII. This is why I question whether it's realistic to require RFC2047, given that some applications -- e.g., headers that might want to carry a IRI -- are already using an encoding that's not RFC2047. Of course, you can say that they're not carrying non-ASCII characters, because it's just a URI, but I'd say that's just a way of squinting at the problem, and RFC2047 is yet another way of squinting; it looks like it's just ASCII as well. >> Mind you, personally I'm not religious about this; I just think >> that if we mandate RFC2047 encoding be used in new headers that >> need an encoding, we're going to be ignored, for potentially good >> reasons. > > What good reasons? In this case, we are not mandating anything. > We are simply passing through the one and only defined i18n solution > for HTTP/1.1 because it was the only solution available in 1994. > If email clients can (and do) implement it, then so can WWW clients. See above. Specifically, what impact does the requirement to use RFC2047 have on other encodings -- is it saying that serialising an IRI as a URI in a HTTP header is non-conformant? That if another problem domain, for whatever reason, decides to mint a header that uses BCP137 instead of RFC2047, that it also violates HTTP? This seems a stretch to me... I'd put forth that the requirement is spurious. > People who want to fix that should start queueing for HTTP/1.2. Please explain how removing the requirement that only RFC2047 be used to encode non-ISO-8859-1 characters in new headers requires a version bump. > >> 2) Constrain TEXT to contain only characters from iso-8859-1. > > No, that breaks compliant senders. How? Are you saying that senders are already sending text that contains non-8859-1 characters (post-encoding)? >> 3) Add advice that, for a particular context of use, other >> characters MAY be encoded (whether that's strictly RFC2047, or more >> fine-grained advice TBD) by specifying it in that context. >> 4) Add new issues for dealing with specific circumstances (e.g., >> From, Content-Disposition, Warning) as necessary. If the outcome of >> #3 is to require RFC2047, this is relatively straightforward. > > There is no great need that has been established to support any > changes to the allowed TEXT encoding other than to separate the > rules that don't actually allow that encoding. IMO, changes to > HTTP/1.1 must be motivated by actual implementations. Could be. Again, my main concern here is to take the blanket requirement away and make it more focused. -- Mark Nottingham http://www.mnot.net/
Received on Wednesday, 26 March 2008 01:02:34 UTC