Re: internationalization/ISO10646 question - UTF-16 from Martin Duerst on 2002-12-24 (ietf-charsets@w3.org from October to December 2002)

From: Martin Duerst <duerst@w3.org>
Date: Tue, 24 Dec 2002 10:45:46 -0500
To: Chris Newman <Chris.Newman@Sun.COM>, Markus Scherer <markus.scherer@jtcsv.com>, charsets <ietf-charsets@iana.org>
Message-id: <4.2.0.58.J.20021224104005.05411d68@localhost>

At 18:48 02/12/23 -0800, Chris Newman wrote:

>All the UTF-16 APIs in Windows and MacOS are a huge barrier to deployment 
>of Unicode on those platforms since all the code has to be rewritten (and 
>most of it never is).  If they had instead retro-fitted UTF-8 into the 
>existing 8-bit APIs we'd have much better Unicode deployment.

Hello Chris,

I very much agree with all the rest you said, but I'm not
very sure about this point.

For some types of operations (parsing in an ASCII-based context,
simple copying of whole strings,...), you are probably right.
But I guess many applications would have happily chopped up
characters without knowing what they are doing, counted
bytes as characters, and so on, and many programmers would
just have assumed that their software worked because it
somehow worked for the (English) text they were dealing
with. Also, a lot of 8-bit but non-UTF-8 data would have
ended up in the wrong place.

Not that the current approach is without problems, but
it's far from clear that UTF-8 inside applications would
work better.

Regards,   Martin.

Received on Tuesday, 24 December 2002 10:53:50 UTC