W3C home > Mailing lists > Public > ietf-charsets@w3.org > October to December 2002

(unknown charset) Re: internationalization/ISO10646 question - UTF-16

From: (unknown charset) Keld Jørn Simonsen <keld@dkuug.dk>
Date: Tue, 24 Dec 2002 09:55:08 +0100
To: (unknown charset) Markus Scherer <markus.scherer@jtcsv.com>
Cc: (unknown charset) charsets <ietf-charsets@iana.org>
Message-id: <20021224085508.GA17063@rap.rap.dk>

On Thu, Dec 19, 2002 at 02:03:12PM -0800, Markus Scherer wrote:
> 
> Remember that UTF-8 was designed to shoehorn Unicode/UCS into Unix file 
> systems, nothing more. Where ASCII byte-stream compatibility is not an 
> issue, there are Unicode charsets that are more efficient than UTF-8, 
> different ones for different uses.

Well, it is true that the UTF-FSS encoding, the previous name for UTF-8,
was for UNIX filesystems (FSS means File Systems Safe), but when it was
renamed to UTF-8 by SC2/WG2, it at the same time replaced the UTF-1
encoding, which was intended for network use. So UTF-8 is purposedly
meant for network interchange by the designers of ISO 10646.
Furthermore IETF/IESG has stated the policy that UTF-8 is the preferred 
encoding for all Internet protocols, all existing protocols need
to support it, and new protocols should only use UTF-8. 
So nowadays UTF-8 is much more than just for Unix filesystems.

One wonders why W3C made UTF-16 the encoding of choice for XML.

Kind regards
Keld
Received on Tuesday, 24 December 2002 03:55:44 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 5 June 2006 15:10:54 GMT