- From: Steve Billings <billings@global360.com>
- Date: Tue, 11 Nov 2003 14:54:10 -0500
- To: "Ienup Sung" <is@mpkmail.eng.sun.com>, <i18n-prog@yahoogroups.com>
- Cc: <www-international@w3c.org>
Ienup: [I think i18n-prog may be more appropriate for this discussion that www-international; can we move this discussion to i18n-prog?] > Many Roman numerals and circled numbers are a part of JIS X 0208 I don't see them in the Unicode 4.0 JIS mapping tables (that's the latest version I happen to have at my fingertips). Do you know their Unicode or JIS codepoints? When I enter circle-1 from my Windows 2000 Japanese IME (choosing the circled-1 character from the list of choices presented for "ichi") into a text file (notepad), and save it as Unicode, it saves the Unicode character U+2460. This Unicode character does not appear in any of the Unicode 4.0 JIS mappings: JIS0201.txt, JIS0208.txt, JIS0212.txt, or SHIFTJIS.txt (Unicode 4.0 CD: \Mappings\EASTASIA\JIS). (To find a mapping for it, you need to go to \Mappings\VENDORS\Microsoft\WINDOWS\CP932.txt.) So when at least some software such as Oracle, for example, tries to convert that character for storing in a Shift-JIS or EUC database, it fails to find a mapping, and replaces it with the substitution character. It's certainly conceivable that some software (like, apparently, Sourav's telnet client if he was running it on Windows) does some round-trip mapping other than what's shown in the Unicode 4.0 tables. I'd be very interested to learn which JIS characters are being mapped to. Sourav: can you supply the hex value of the EUC character you find in the text file when you enter circle-1? Steve Steve Billings Global 360 Software Internationalization & Localization http://www.global360.com/ Office: 978-266-1604 Cell: 978-697-8201 -----Original Message----- From: www-international-request@w3.org [mailto:www-international-request@w3.org]On Behalf Of Ienup Sung Sent: Tuesday, November 11, 2003 12:41 PM To: www-international@w3c.org Subject: Re: [Fwd: Solaris box with ja as locale supports Roman numbers, Circled numbers in Japanese strings] Hello, Many Roman numerals and circled numbers are a part of JIS X 0208 and also a part of SJIS and so any Japanese EUC and Shift_JIS/PCK locales will support the characters and that includes Japanese locales in Solaris. And ISO-2022-JP also has JIS X 0208. With regards, Ienup ] Subject: Solaris box with ja as locale supports Roman numbers, Circled ] numbers in Japanese strings ] Resent-Date: Mon, 10 Nov 2003 06:42:58 -0500 (EST) ] Resent-From: www-international@w3.org ] Date: Mon, 10 Nov 2003 02:42:41 -0500 ] From: souravm <souravm@infosys.com> (by way of Martin Duerst ] <duerst@w3.org>) ] To: www-international@w3.org ] ] ] ] ] ] Hi Steve (and all), ] ] I'm observing something funny in Solaris box related to the issue of ] support for Roman numbers and Circled numbers in Japanese string by EUC-JP, ] which we discussed previously. ] ] I'm having a solaris box 2.8. There I'm setting ja as locale (LANG=ja, ] LC_ALL=ja) which is supposed to be EUC-Jp equivalent in Solaris. I'm ] accessing the Solaris box from a telnet client - there also I'm setting the ] encoding as EUC-JP. ] ] Now I'm trying to type those circled numbers and Roman numbers through the ] telnet client in - a) Command Prompt, b) In a file opened in VI editor. ] ] The observation is - I'm successfully able to type (in both command prompt ] and VI editor) and store those characters (in VI editor). ] ] Based on our previous understanding EUC-JP is not supposed to support these ] characters. In that case I don't know how do we rationalize above ] observation. ] ] Any clue ? ] ] Regards, ] Sourav ] ] -----Original Message----- ] From: Steve Billings [mailto:billings@global360.com] ] Sent: Thursday, October 23, 2003 2:48 AM ] To: souravm; www-international@w3.org ] Subject: RE: Query on Encoding supporting Roman numbers, Circled numbers in ] Japanese strings ] ] Those characters are non-JIS-standard characters (therefore not in ] ISO-2022-JP or EUC-JP) that exist in Microsoft CP932 (the Japanese Windows ] codepage). In other words: yes, you are correct. ] ] Steve ] ] ] Steve Billings ] Global 360 ] Software Internationalization & Localization ] http://www.global360.com/ ] Office: 978-266-1604 ] Cell: 978-697-8201 ] ] -----Original Message----- ] From: www-international-request@w3.org ] [mailto:www-international-request@w3.org]On Behalf Of souravm (by way of ] Martin Duerst <duerst@w3.org>) ] Sent: Wednesday, October 22, 2003 12:17 PM ] To: www-international@w3.org ] Subject: Query on Encoding supporting Roman numbers, Circled numbers in ] Japanese strings ] ] ] ] ] Hi All, ] ] I've a simple application which accepts Japanese string from a HTML form ] and then show the same string in the response page. ] ] Now if I enter Roman characters like I, II, etc and Circled numbers like ] $B-!!"-"(B etc as a part of Japanese string, the string is properly shown back ] in response page when the encoding used is UTF-8. However, the same thing ] does not work in case of EUC_JP, Shift_JIS and ISO-2022-JP as encoding. ] ] I believe these characters are not supported in EUC_JP, Shift_JIS and ] ISO-2022_jp. Can anyone please confirm it ? ] ] Regards, ] Sourav ] ] ]
Received on Tuesday, 11 November 2003 14:56:09 UTC