- From: Robert Brewer <fumanchu@aminus.org>
- Date: Thu, 27 Mar 2008 10:28:34 -0700
- To: "Mark Nottingham" <mnot@mnot.net>, "Martin Duerst" <duerst@it.aoyama.ac.jp>, "Jamie Lokier" <jamie@shareable.org>
- Cc: "Roy T. Fielding" <fielding@gbiv.com>, "HTTP Working Group" <ietf-http-wg@w3.org>
Mark Nottingham wrote: > >> My intent was not to disallow RFC2047, but rather to allow other > >> encodings into iso-8859-1 where appropriate. > ... > > Roy said (to paraphrase) that IRIs do not show up in HTTP -- that > they're just URIs. I agree with that, but only as far as you can view > IRIs as an encoding into ASCII (albeit an imperfect one, because you > can't round-trip them, since there's a bit of ambiguity). > > RFC2047 is also an encoding into ASCII; it is not a character encoding > in its own right. In that sense, it's a peer of BCP137 and other > schemes that do similar things. They all end up taking characters from > a set greater than that available to iso-8859-1 and encoding them into > a subset of it (usually ASCII) using escape sequences. > > That being the case, my question is this: is it realistic to require > all headers to use RFC2047 encoding, to the exclusion of BCP137, etc? BCP137 itself says "...this specification does not recommend one specific syntax." That is, I don't see them as peers. RFC2047 is how HTTP "defines the syntax" for TEXT already, which means any compliant HTTP/1.1 implementation already has code for this. How wide are we going to open the floodgates for other encodings? As a server author, I'd rather not have to add large chunks of code in 2008 to become "http-bis compliant". I'm pretty happy with RFC2047. > I could understand such a requirement if we had a blanket requirement > that RFC2047 encoding could occur anywhere, so that implementations > could blindly decode/encode headers as necessary, whether they > recognised them or not. However, we're not going in that direction, > because it's not reasonable to implement... I don't understand. From where I sit that sounds like not only a snap to write from scratch, but has the potential to simplify a lot of codebases. > ...and in any case the encoding > is already tied to the semantics of the headers somewhat, since you > have to recognise the header to understand its structure enough to > know where TEXT may appear (i.e., it's not a complete blanket, just an > uneven one over TEXT). > > That being the case, I can't help but see the RFC2047 requirement as > spurious, and the most straightforward thing to do would seem to be to > ditch the spurious requirement and move on -- without disallowing > RFC2047 encoding from being specified in a particular header if that > makes sense, but not disallowing other encodings either. Hrm. I'm not sure what "other encodings" includes. When Jamie Lokier says "I'm in favour of allowing UTF-8," does that mean the unicode string u'\u212bngstr\xf6m' would emit as: If-Match: %E2%84%ABngstr%C3%B6m ...and how is the server supposed to know how to decode that? There's one thing that RFC2047 provides that other, more minimal, encoding schemes do not provide: the small bit of metadata that actually declares which encoding is being used. If you want to encode your non-ASCII header as UTF-8, fine, that's not in opposition to RFC2047: If-Match: =?utf-8?q?=E2=84=ABngstr=C3=B6m?= It's not only utf-8 but the server knows it's utf-8 without having to sniff anything. Robert Brewer fumanchu@aminus.org
Received on Thursday, 27 March 2008 17:28:24 UTC