W3C home > Mailing lists > Public > www-international@w3.org > October to December 2004

Re: List of Japanese Shift_JIS characters which are not supported in Unicode

From: KUROSAKA Teruhiko <kuro@bhlab.com>
Date: Mon, 11 Oct 2004 13:31:08 -0700
Message-ID: <416AED8C.3090403@bhlab.com>
To: souravm <SOURAVM@infosys.com>
CC: www-international@w3.org, unicode@unicode.org

Souravm,

> Is there anywhere an exhaustive list of Japanese characters (especially 
> Shift_JIS characters) which are not supported in Unicode ?

I'm not 100% sure but I think all the characters in Shift_JIS,
being part of the national code set, are supported by Unicode.
But there are things you need to be careful about.


The mapping between Shift_JIS and Unicode defer platform to
platform.  This cuaes an interoperability problem.  See:

http://www.ingrid.org/java/i18n/unicode-utf8.html


If you mean Microsoft extension to Shift_JIS, code page 932,
rather than Shift_JIS proper as defined as part of JIS X 0208,
then there is a round-trip conversion issue because
code page 932 includes many duplicated characters in its
extension areas.  This paper summarizes this issue:

http://www.opengroup.or.jp/jvc/cde/ucs-conv-e.html


There aren't a perfect solution for these two issues.
You'd have to decide what to do depending on the
needs of specific applications.

-- 
KUROSAKA ("Kuro") Teruhiko, San Francisco, California, USA
Internationalization Consultant
http://www.bhlab.com/
Received on Monday, 11 October 2004 20:31:25 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 2 June 2009 19:17:04 GMT