RE: internationalization/ISO10646 question from McDonald, Ira on 2003-01-03 (ietf-charsets@w3.org from January to March 2003)

From: McDonald, Ira <imcdonald@sharplabs.com>
Date: Fri, 03 Jan 2003 11:33:21 -0800
To: "'MURATA Makoto'" <murata@hokkaido.email.ne.jp>, Chris Newman <Chris.Newman@sun.com>
Cc: Marcin Hanclik <mhanclik@poczta.onet.pl>, ietf-charsets@iana.org
Message-id: <116DB56CD7DED511BC7800508B2CA53735CE68@mailsrvnt02.enet.sharplabs.com>

Hi,

But Unicode 3.2 (Unicode Standard Annex #28, March 2002) 
makes very clear in Table 3.1B "Legal UTF-8 Byte Sequences"
that there is _not_ a 6-byte UTF-8 representation of non-BMP 
characters.  

Also, section VIII "Relation to ISO/IEC 10646" of Unicode 3.2
describes ISO Amendment 1 to ISO/IEC 10646-1:2000, which
limits future ISO/IEC 10646 code point assignments to the 
range of UTF-16.

Therefore, UTF-8 is always the _same_ size (4 bytes) for 
non-BMP characters that both UTF-16 and UTF-32 are.

Cheers,
- Ira McDonald
  High North Inc


-----Original Message-----
From: MURATA Makoto [mailto:murata@hokkaido.email.ne.jp]
Sent: Thursday, January 02, 2003 8:11 PM
To: Chris Newman
Cc: Marcin Hanclik; ietf-charsets@iana.org
Subject: Re: internationalization/ISO10646 question


Chris,

> Is UTF-8 perfect?

We agree :-)

>No.  But the costs greatly outweight the benefits when 
> compared to any other charset I've seen, and particularly when compared to

> UTF-16.

I do not agree on this claim yet.  In particular, I'm concerned with the
6-byte 
representation of non-BMP characters.  When non-BMP characters become
common, 
what will happen?

Cheers,

-- 
MURATA Makoto <murata@hokkaido.email.ne.jp>

Received on Friday, 3 January 2003 14:34:49 UTC