RE: UTF8/UTF16 from Jon Hanna on 2002-08-20 (w3c-wai-ig@w3.org from July to September 2002)

From: Jon Hanna <jon@spin.ie>
Date: Tue, 20 Aug 2002 11:39:28 +0100
To: "WAI \(E-mail\)" <w3c-wai-ig@w3.org>
Message-ID: <NDBBLCBLIMDOPKMOPHLHGECAEFAA.jon@spin.ie>

> Could somebody please explain the difference between UTF8 and UTF16 to me
> and why you would want to use UTF16 over UTF8?

Jukka has ably answered the first part of the question. The second part, why
one would want to use UTF-16 rather than UTF-8, has two main answers.

The first is that it's easier to convert from UCS-2 to UTF-16, in fact
UTF-16 is exactly the same as UCS-2, they differ only when it comes to
character points outside of the UCS-2 range. UCS-2 is used internally in
some operating systems (Windows NT for example), and is the "natural" type
of character in some languages (VB, Java).

The second is that the way that UTF-8 encodes UCS results in shortening the
size of the bytestream when the characters are mainly from the ASCII range,
maintaining the same size when the characters are mainly from the range
U0080 - U07FF, and increasing the size when the characters are mainly above
character point U07FF. Hence for some languages UTF-16 may be more efficient
on bandwidth.

I find it hard to believe that their are many user-agents that can support
UTF-8 but not UTF-16, but maybe I'm putting too much faith in common-sense.
Certainly any browser that can process XML must be able to support UTF-16.

Received on Tuesday, 20 August 2002 06:37:45 UTC