W3C home > Mailing lists > Public > www-international@w3.org > January to March 2002

Re: For Chinese, UTF-8 or UTF-16 encoding?

From: Asmus Freytag <asmusf@ix.netcom.com>
Date: Mon, 11 Mar 2002 22:51:36 -0800
Message-Id: <4.2.0.58.20020311224624.0236bf88@popd.ix.netcom.com>
To: "Musale, Shailendra" <Shailendra.Musale@F-Secure.com>(by way of Martin Duerst <duerst@w3.org>), www-international@w3.org
At 02:14 PM 3/12/02 +0900, Musale, Shailendra wrote:
>For Chinese localized files, should we use
>UTF-8 encoding or UTF-16 encoding?

There are two criteria:

o size
o interchange

Typical Chinese strings in UTF-8 would be 50% longer than in UTF-16.
This assumes that the *entire* text is in Chinese characters. If
the strings contain XML or HTML markup, for example, the the
percentage goes down.

If the recipient of the strings can handle UTF-16 as easily as
UTF-8, then size could be the sole criterion. This would be true
for storing message catalogs where the retrieving software could
perform conversions as necessary to serve each client what they
can handle.

A third criterion, processability, would need to be evaluated in
some cases, but seems to not apply for the situation mentioned.

A./
Received on Tuesday, 12 March 2002 01:50:35 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 2 June 2009 19:16:58 GMT