W3C home > Mailing lists > Public > www-international@w3.org > April to June 2003

RE: Why is UTF8 not being taken up in Asia Pacific for Public Websites?

From: Kurosaka, Teruhiko <Teruhiko.Kurosaka@iona.com>
Date: Sat, 17 May 2003 09:48:17 -0700
To: "LUNDER,BEN (HP-Australia,ex3)" <ben.lunder@hp.com>, <www-international@w3.org>
Cc: "PETERSON,MARK (HP-Boise,ex1)" <mark.peterson@hp.com>

It does make sense to use UTF-8 for your purpose where the
users are in a controlled environment, with one reservation and
I would like to get feedback from other members on this.

If UTF-8 is used at the web browser level, the mapping between
the legacy encoding and UTF-8 depends on the browser and/or
the OS platform (if browser uses the conversion facility provided 
by the OS platform).  It is well known that certain characters in
Japanese computation map differently to Unicode (thus UTF-8)
depending on the OS/language platforms.
For example, 0x5c in Shift JIS, which is supposed to mean
the Japanese currency YEN SIGN but acts like a backslash
(0x5c in ASCII),  is treated as though it were
a regular backslash, and mapped to Unicode U+005C on
Windows but it is mapped to U+00A5 (YEN SIGN) on
So the character that the user peceives the same are
handled and stored differently by the application, if we
take the approach to let the browser convert to UTF-8.
Supoose the (half-size) YEN SIGN is entered from the MacOS,
stored in the database.  Later sobody view the data from
Windows, that data could be displayed as a square (meaning
the system cannot display this character). 

Has anyone experienced problems like this in reality?  Do
popular browsers do code conversion by themselves, or
do they use OS facilities?

T. "Kuro" Kurosaka
Internationalization Architect
IONA Technologies
2350 Mission College Blvd. Suite 650
Santa Clara, CA 95054
Tel: (408) 350 9684/9500 
Fax: (408) 350 9501
Making Software Work Together TM
Received on Saturday, 17 May 2003 12:48:26 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 22:40:47 UTC