- From: Chris Newman <Chris.Newman@Sun.COM>
- Date: Fri, 03 Jan 2003 17:56:59 -0800
- To: MURATA Makoto <murata@hokkaido.email.ne.jp>
- Cc: Marcin Hanclik <mhanclik@poczta.onet.pl>, ietf-charsets@iana.org
begin quotation by MURATA Makoto on 2003/1/3 11:11 +0900: > I do not agree on this claim yet. In particular, I'm concerned with the > 6-byte representation of non-BMP characters. When non-BMP characters > become common, what will happen? Software which is fully UTF-8 native will likely work just fine. UTF-8 aware software already has support for variable width characters, whether it is 2, 3, 4, 5 or 6 octets in the variable width character, the code path used should be the same and will have already been tested. Software which converts UTF-8 to UCS-2 will break completely. There may be more of this junk out there than one might hope. Software which converts UTF-8 to UTF-16 may not work because a lot of UTF-16 software has never been tested with variable-width characters. That's actually the most serious flaw in UTF-16. It's a variable width encoding, but the variable width characters are an uncommon case (currently). That means all the code to support non-16 bit characters in UTF-16 is an uncommon case and those codepaths haven't been tested (if they exist). Thus you can expect deployed UTF-16 based software to break in various ways as non-BMP characters show up. Unfortunately, I'm afraid the majority of software will fall in the latter two categories. - Chris
Received on Friday, 3 January 2003 21:03:40 UTC