- From: McDonald, Ira <imcdonald@sharplabs.com>
- Date: Fri, 03 Jan 2003 11:33:21 -0800
- To: "'MURATA Makoto'" <murata@hokkaido.email.ne.jp>, Chris Newman <Chris.Newman@sun.com>
- Cc: Marcin Hanclik <mhanclik@poczta.onet.pl>, ietf-charsets@iana.org
Hi, But Unicode 3.2 (Unicode Standard Annex #28, March 2002) makes very clear in Table 3.1B "Legal UTF-8 Byte Sequences" that there is _not_ a 6-byte UTF-8 representation of non-BMP characters. Also, section VIII "Relation to ISO/IEC 10646" of Unicode 3.2 describes ISO Amendment 1 to ISO/IEC 10646-1:2000, which limits future ISO/IEC 10646 code point assignments to the range of UTF-16. Therefore, UTF-8 is always the _same_ size (4 bytes) for non-BMP characters that both UTF-16 and UTF-32 are. Cheers, - Ira McDonald High North Inc -----Original Message----- From: MURATA Makoto [mailto:murata@hokkaido.email.ne.jp] Sent: Thursday, January 02, 2003 8:11 PM To: Chris Newman Cc: Marcin Hanclik; ietf-charsets@iana.org Subject: Re: internationalization/ISO10646 question Chris, > Is UTF-8 perfect? We agree :-) >No. But the costs greatly outweight the benefits when > compared to any other charset I've seen, and particularly when compared to > UTF-16. I do not agree on this claim yet. In particular, I'm concerned with the 6-byte representation of non-BMP characters. When non-BMP characters become common, what will happen? Cheers, -- MURATA Makoto <murata@hokkaido.email.ne.jp>
Received on Friday, 3 January 2003 14:34:49 UTC