- From: MURATA Makoto <murata@hokkaido.email.ne.jp>
- Date: Wed, 25 Dec 2002 11:51:06 +0900
- To: Chris Newman <Chris.Newman@sun.com>
- Cc: Marcin Hanclik <mhanclik@poczta.onet.pl>, ietf-charsets@iana.org
On Fri, 06 Dec 2002 13:13:41 -0800 Chris Newman <Chris.Newman@sun.com> wrote: > > UTF-16 is a terrible encoding for interoperability. There are 3 published > non-interoperable variants of UTF-16 (big-endian, little-endian, > BOM/switch-endian) and only one of the variants can be auto-detected with > any chance of success (and none of them can be auto-detected as well as > UTF-8). Unfortunately, as far as I know, UTF-8 is not free of such problems. (1) With or without the Unicode signature, (2) possible confusion with other ASCII-compatible encodings (especially when a program has a few non-ASCII characters), (3) vulnerability caused by redundant octet sequences, and (4) use of 4 or 6 octets for non-BMP characters (e.g., writeUTF and readUTF of java.io.DataOutput). I know that Corrigendum #1: UTF-8 Shortest Form addresses (3), but I am not sure if implementations are free of this vulnerability. I would be very happy if some encoding of Unicode becomes free of interoperability or security problems. But I am not happy yet. -- MURATA Makoto <murata@hokkaido.email.ne.jp>
Received on Tuesday, 24 December 2002 21:50:49 UTC