Re: internationalization/ISO10646 question - UTF-16 from Keld Jørn Simonsen on 2002-12-24 (ietf-charsets@w3.org from October to December 2002)

From: Keld Jørn Simonsen <keld@dkuug.dk>
Date: Tue, 24 Dec 2002 09:55:08 +0100
To: Markus Scherer <markus.scherer@jtcsv.com>
Cc: charsets <ietf-charsets@iana.org>
Message-id: <20021224085508.GA17063@rap.rap.dk>

On Thu, Dec 19, 2002 at 02:03:12PM -0800, Markus Scherer wrote:
> 
> Remember that UTF-8 was designed to shoehorn Unicode/UCS into Unix file 
> systems, nothing more. Where ASCII byte-stream compatibility is not an 
> issue, there are Unicode charsets that are more efficient than UTF-8, 
> different ones for different uses.

Well, it is true that the UTF-FSS encoding, the previous name for UTF-8,
was for UNIX filesystems (FSS means File Systems Safe), but when it was
renamed to UTF-8 by SC2/WG2, it at the same time replaced the UTF-1
encoding, which was intended for network use. So UTF-8 is purposedly
meant for network interchange by the designers of ISO 10646.
Furthermore IETF/IESG has stated the policy that UTF-8 is the preferred 
encoding for all Internet protocols, all existing protocols need
to support it, and new protocols should only use UTF-8. 
So nowadays UTF-8 is much more than just for Unix filesystems.

One wonders why W3C made UTF-16 the encoding of choice for XML.

Kind regards
Keld

Received on Tuesday, 24 December 2002 03:55:44 UTC