Re: ban the use and implementation of UTF-7 from Martin Duerst on 2006-12-15 (www-international@w3.org from October to December 2006)

From: Martin Duerst <duerst@it.aoyama.ac.jp>
Date: Fri, 15 Dec 2006 16:25:33 +0900
To: "Roy T. Fielding" <fielding@gbiv.com>, W3C TAG <www-tag@w3.org>
Cc: "Mark Davis" <mark.davis@icu-project.org>, Deborah Goldsmith <goldsmit@apple.com>, chris.newman@innosoft.com, mrc@washington.edu, www-international@w3.org, ietf-charsets@iana.org, Misha Wolf <Misha.Wolf@reuters.com>
Message-Id: <6.0.0.20.2.20061215160435.09c515a0@localhost>

Hello Roy,

As you can see at
http://lists.w3.org/Archives/Public/www-international/2006OctDec/0144,
Mark Davis, one of the authors, essentially agrees with you.
In a followup on the ietf-charsets mailing list, Deborah Goldsmith,
the other author of the UTF-7 spec, also agrees.

The only place I'm aware that (a variant!) of UTF-8 is used is
for IMAP folder name internationalization. See e.g.
http://www.ietf.org/rfc/rfc2192.txt for details.
In hindsight, using an UTF-7 variant in the protocol seems
unnecessary, but the original idea (mostly by Mark Crispin,
as far as I understand it) was that it could be used as is
on the server side, even on totally un-internationalized
operating systems.

As for the browsers, I think they just added UTF-7 at one time
because the name looked similar to UTF-8 and UTF-16, and it was
difficult to predict exactly how these encodings would deploy.
And as in any software, it's difficult to get rid of something,
but security reasons are about the best you can come up with
for cleaning up.

As for the IANA charset registry
(http://www.iana.org/assignments/character-sets), Ned and
me (who are currently the expert reviewers) as well as the
other list participants have been talking about cleaning it
up. We don't currently yet have an exact idea of what needs
to be done, but being able to attach security warnings or
similar comments to an entry might be one possible way to
proceed. The problem might be that RFC 2152
(http://www.ietf.org/rfc/rfc2152.txt) might have to be updated.

But as far as the browsers are concerned, if the TAG can come
up with a finding that e.g. also gives some more details and
examples about the security issues you mention, then we might
also be able to point to this document from anything on the
IETF or IANA side.

Regards,     Martin.

At 07:13 06/12/15, Roy T. Fielding wrote:
>
>Over the years I have seen a number of security exploits that make
>use of broken browsers that sniff character encodings in combination
>with UTF-7 encoded tags or javascript commands.  I have never actually
>seen anyone use UTF-7 for anything legitimate (other than testing).
>
>Is there some reason why WWW clients need to support UTF-7?
>
>It seems completely unnecessary given the now ubiquitous use of 8-bit
>clean transports and the presence of UTF-8, which IIRC was defined
>long after UTF-7.  However, the wider community may be aware of
>some reason why browsers should support it, so I'd like to hear
>your comments.
>
>If there is no need for UTF-7, I'd like the TAG to consider it an
>issue for the sake of asking browsers to remove its implementation
>and banning its use by servers.
>
>I know this won't solve any problems for deployed clients, and
>wouldn't be an issue at all if servers used the same algorithm for
>escaping characters that clients used to interpret them, but in the
>long term it will simplify some checks for XSS attacks and I don't
>think it will harm the Web.  That is, unless there is some significant
>body of content out there that is encoded as UTF-7.
>
>Cheers,
>
>Roy T. Fielding                            <http://roy.gbiv.com/>
>Chief Scientist, Day Software              <http://www.day.com/>
>
>

#-#-#  Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
#-#-#  http://www.sw.it.aoyama.ac.jp       mailto:duerst@it.aoyama.ac.jp

Received on Friday, 15 December 2006 08:13:59 UTC