RE: [Fwd: Solaris box with ja as locale supports Roman numbers, Circled numbers in Japanese strings]

Ienup:

[I think i18n-prog may be more appropriate for this discussion that
www-international; can we move this discussion to i18n-prog?]

> Many Roman numerals and circled numbers are a part of JIS X 0208
I don't see them in the Unicode 4.0 JIS mapping tables (that's the latest
version I happen to have at my fingertips). Do you know their Unicode or JIS
codepoints?

When I enter circle-1 from my Windows 2000 Japanese IME (choosing the
circled-1 character from the list of choices presented for "ichi") into a
text file (notepad), and save it as Unicode, it saves the Unicode character
U+2460. This Unicode character does not appear in any of the Unicode 4.0 JIS
mappings: JIS0201.txt, JIS0208.txt, JIS0212.txt, or SHIFTJIS.txt (Unicode
4.0 CD: \Mappings\EASTASIA\JIS). (To find a mapping for it, you need to go
to \Mappings\VENDORS\Microsoft\WINDOWS\CP932.txt.)

So when at least some software such as Oracle, for example, tries to convert
that character for storing in a Shift-JIS or EUC database, it fails to find
a mapping, and replaces it with the substitution character.

It's certainly conceivable that some software (like, apparently, Sourav's
telnet client if he was running it on Windows) does some round-trip mapping
other than what's shown in the Unicode 4.0 tables. I'd be very interested to
learn which JIS characters are being mapped to. Sourav: can you supply the
hex value of the EUC character you find in the text file when you enter
circle-1?

Steve

Steve Billings
Global 360
Software Internationalization & Localization
http://www.global360.com/
Office: 978-266-1604
Cell:    978-697-8201

-----Original Message-----
From: www-international-request@w3.org
[mailto:www-international-request@w3.org]On Behalf Of Ienup Sung
Sent: Tuesday, November 11, 2003 12:41 PM
To: www-international@w3c.org
Subject: Re: [Fwd: Solaris box with ja as locale supports Roman numbers,
Circled numbers in Japanese strings]


Hello,

Many Roman numerals and circled numbers are a part of JIS X 0208
and also a part of SJIS and so any Japanese EUC and Shift_JIS/PCK locales
will support the characters and that includes Japanese locales in Solaris.
And ISO-2022-JP also has JIS X 0208.

With regards,

Ienup


] Subject: Solaris box with ja as locale supports   Roman numbers, Circled
] numbers in Japanese strings
] Resent-Date: Mon, 10 Nov 2003 06:42:58 -0500 (EST)
] Resent-From: www-international@w3.org
] Date: Mon, 10 Nov 2003 02:42:41 -0500
] From: souravm <souravm@infosys.com> (by way of Martin Duerst
] <duerst@w3.org>)
] To: www-international@w3.org
]
]
]
]
]
] Hi Steve (and all),
]
] I'm observing something funny in Solaris box related to the issue of
] support for Roman numbers and Circled numbers in Japanese string by
EUC-JP,
] which we discussed previously.
]
] I'm having a solaris box 2.8. There I'm setting ja as locale (LANG=ja,
] LC_ALL=ja) which is supposed to be EUC-Jp equivalent in Solaris. I'm
] accessing the Solaris box from a telnet client - there also I'm setting
the
] encoding as EUC-JP.
]
] Now I'm trying to type those circled numbers and Roman numbers through the
] telnet client in - a) Command Prompt, b) In a file opened in VI editor.
]
] The observation is - I'm successfully able to type (in both command prompt
] and VI editor) and store those characters (in VI editor).
]
] Based on our previous understanding EUC-JP is not supposed to support
these
] characters. In that case I don't know how do we rationalize above
] observation.
]
] Any clue ?
]
] Regards,
] Sourav
]
] -----Original Message-----
] From: Steve Billings [mailto:billings@global360.com]
] Sent: Thursday, October 23, 2003 2:48 AM
] To: souravm; www-international@w3.org
] Subject: RE: Query on Encoding supporting Roman numbers, Circled numbers
in
] Japanese strings
]
] Those characters are non-JIS-standard characters (therefore not in
] ISO-2022-JP or EUC-JP) that exist in Microsoft CP932 (the Japanese Windows
] codepage). In other words: yes, you are correct.
]
] Steve
]
]
] Steve Billings
] Global 360
] Software Internationalization & Localization
] http://www.global360.com/
] Office: 978-266-1604
] Cell:    978-697-8201
]
] -----Original Message-----
] From: www-international-request@w3.org
] [mailto:www-international-request@w3.org]On Behalf Of souravm (by way of
] Martin Duerst <duerst@w3.org>)
] Sent: Wednesday, October 22, 2003 12:17 PM
] To: www-international@w3.org
] Subject: Query on Encoding supporting Roman numbers, Circled numbers in
] Japanese strings
]
]
]
]
] Hi All,
]
] I've a simple application which accepts Japanese string from a HTML form
] and then show the same string in the response page.
]
] Now if I enter Roman characters like I, II, etc and Circled numbers like
] $B-!!"-"(B etc as a part of Japanese string, the string is properly
shown
back
] in response page when the encoding used is UTF-8. However, the same thing
] does not work in case of EUC_JP, Shift_JIS and ISO-2022-JP as encoding.
]
] I believe these characters are not supported in EUC_JP, Shift_JIS and
] ISO-2022_jp. Can anyone please confirm it ?
]
] Regards,
] Sourav
]
]
]

Received on Tuesday, 11 November 2003 14:56:09 UTC