- From: Kurosaka, Teruhiko <Teruhiko.Kurosaka@iona.com>
- Date: Sat, 17 May 2003 09:48:17 -0700
- To: "LUNDER,BEN (HP-Australia,ex3)" <ben.lunder@hp.com>, <www-international@w3.org>
- Cc: "PETERSON,MARK (HP-Boise,ex1)" <mark.peterson@hp.com>
Ben, It does make sense to use UTF-8 for your purpose where the users are in a controlled environment, with one reservation and I would like to get feedback from other members on this. If UTF-8 is used at the web browser level, the mapping between the legacy encoding and UTF-8 depends on the browser and/or the OS platform (if browser uses the conversion facility provided by the OS platform). It is well known that certain characters in Japanese computation map differently to Unicode (thus UTF-8) depending on the OS/language platforms. http://www.ingrid.org/java/i18n/unicode-utf8.html For example, 0x5c in Shift JIS, which is supposed to mean the Japanese currency YEN SIGN but acts like a backslash (0x5c in ASCII), is treated as though it were a regular backslash, and mapped to Unicode U+005C on Windows but it is mapped to U+00A5 (YEN SIGN) on MacOS. So the character that the user peceives the same are handled and stored differently by the application, if we take the approach to let the browser convert to UTF-8. Supoose the (half-size) YEN SIGN is entered from the MacOS, stored in the database. Later sobody view the data from Windows, that data could be displayed as a square (meaning the system cannot display this character). Has anyone experienced problems like this in reality? Do popular browsers do code conversion by themselves, or do they use OS facilities? T. "Kuro" Kurosaka Internationalization Architect teruhiko.kurosaka@iona.com ------------------------------------------------------- IONA Technologies 2350 Mission College Blvd. Suite 650 Santa Clara, CA 95054 Tel: (408) 350 9684/9500 Fax: (408) 350 9501 ------------------------------------------------------- Making Software Work Together TM
Received on Saturday, 17 May 2003 12:48:26 UTC