- From: Mark Nottingham <mnot@mnot.net>
- Date: Thu, 27 Mar 2008 22:00:19 +1100
- To: Martin Duerst <duerst@it.aoyama.ac.jp>
- Cc: "Roy T. Fielding" <fielding@gbiv.com>, HTTP Working Group <ietf-http-wg@w3.org>
On 27/03/2008, at 9:17 PM, Martin Duerst wrote: > At 14:41 08/03/27, Mark Nottingham wrote: >> >> My reading is that HTTP is limited to iso-8859-1 *on the wire*, and >> requires RFC2047 encoding for characters outside of that range. Do >> you >> disagree with that? > > That's what's written in RFC 2616. The question is whether and to > what extent that's (still) sensible in practice. > >> My intent was not to disallow RFC2047, but rather to allow other >> encodings into iso-8859-1 where appropriate. > > What do you mean by "other encodings into iso-8859-1"? > Please explain. I thought I had, but obviously not well. Roy said (to paraphrase) that IRIs do not show up in HTTP -- that they're just URIs. I agree with that, but only as far as you can view IRIs as an encoding into ASCII (albeit an imperfect one, because you can't round-trip them, since there's a bit of ambiguity). RFC2047 is also an encoding into ASCII; it is not a character encoding in its own right. In that sense, it's a peer of BCP137 and other schemes that do similar things. They all end up taking characters from a set greater than that available to iso-8859-1 and encoding them into a subset of it (usually ASCII) using escape sequences. That being the case, my question is this: is it realistic to require all headers to use RFC2047 encoding, to the exclusion of BCP137, etc? I could understand such a requirement if we had a blanket requirement that RFC2047 encoding could occur anywhere, so that implementations could blindly decode/encode headers as necessary, whether they recognised them or not. However, we're not going in that direction, because it's not reasonable to implement, and in any case the encoding is already tied to the semantics of the headers somewhat, since you have to recognise the header to understand its structure enough to know where TEXT may appear (i.e., it's not a complete blanket, just an uneven one over TEXT). That being the case, I can't help but see the RFC2047 requirement as spurious, and the most straightforward thing to do would seem to be to ditch the spurious requirement and move on -- without disallowing RFC2047 encoding from being specified in a particular header if that makes sense, but not disallowing other encodings either. Once again, I have no desire to lie down in the road on this part of the issue -- I'm happy to give this up and move on, or to be told how wrong I am (as i18n isn't my area by any means). I'm just a bit surprised at how hard it is to communicate this view (which leads me to believe that indeed I've got something wrong here). Cheers, -- Mark Nottingham http://www.mnot.net/
Received on Thursday, 27 March 2008 11:01:04 UTC