- From: Martin Duerst <duerst@it.aoyama.ac.jp>
- Date: Fri, 15 Dec 2006 16:25:33 +0900
- To: "Roy T. Fielding" <fielding@gbiv.com>, W3C TAG <www-tag@w3.org>
- Cc: "Mark Davis" <mark.davis@icu-project.org>, Deborah Goldsmith <goldsmit@apple.com>, chris.newman@innosoft.com, mrc@washington.edu, www-international@w3.org, ietf-charsets@iana.org, Misha Wolf <Misha.Wolf@reuters.com>
Hello Roy, As you can see at http://lists.w3.org/Archives/Public/www-international/2006OctDec/0144, Mark Davis, one of the authors, essentially agrees with you. In a followup on the ietf-charsets mailing list, Deborah Goldsmith, the other author of the UTF-7 spec, also agrees. The only place I'm aware that (a variant!) of UTF-8 is used is for IMAP folder name internationalization. See e.g. http://www.ietf.org/rfc/rfc2192.txt for details. In hindsight, using an UTF-7 variant in the protocol seems unnecessary, but the original idea (mostly by Mark Crispin, as far as I understand it) was that it could be used as is on the server side, even on totally un-internationalized operating systems. As for the browsers, I think they just added UTF-7 at one time because the name looked similar to UTF-8 and UTF-16, and it was difficult to predict exactly how these encodings would deploy. And as in any software, it's difficult to get rid of something, but security reasons are about the best you can come up with for cleaning up. As for the IANA charset registry (http://www.iana.org/assignments/character-sets), Ned and me (who are currently the expert reviewers) as well as the other list participants have been talking about cleaning it up. We don't currently yet have an exact idea of what needs to be done, but being able to attach security warnings or similar comments to an entry might be one possible way to proceed. The problem might be that RFC 2152 (http://www.ietf.org/rfc/rfc2152.txt) might have to be updated. But as far as the browsers are concerned, if the TAG can come up with a finding that e.g. also gives some more details and examples about the security issues you mention, then we might also be able to point to this document from anything on the IETF or IANA side. Regards, Martin. At 07:13 06/12/15, Roy T. Fielding wrote: > >Over the years I have seen a number of security exploits that make >use of broken browsers that sniff character encodings in combination >with UTF-7 encoded tags or javascript commands. I have never actually >seen anyone use UTF-7 for anything legitimate (other than testing). > >Is there some reason why WWW clients need to support UTF-7? > >It seems completely unnecessary given the now ubiquitous use of 8-bit >clean transports and the presence of UTF-8, which IIRC was defined >long after UTF-7. However, the wider community may be aware of >some reason why browsers should support it, so I'd like to hear >your comments. > >If there is no need for UTF-7, I'd like the TAG to consider it an >issue for the sake of asking browsers to remove its implementation >and banning its use by servers. > >I know this won't solve any problems for deployed clients, and >wouldn't be an issue at all if servers used the same algorithm for >escaping characters that clients used to interpret them, but in the >long term it will simplify some checks for XSS attacks and I don't >think it will harm the Web. That is, unless there is some significant >body of content out there that is encoded as UTF-7. > >Cheers, > >Roy T. Fielding <http://roy.gbiv.com/> >Chief Scientist, Day Software <http://www.day.com/> > > #-#-# Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University #-#-# http://www.sw.it.aoyama.ac.jp mailto:duerst@it.aoyama.ac.jp
Received on Friday, 15 December 2006 08:13:59 UTC