- From: souravm <souravm@infosys.com>
- Date: Thu, 13 Nov 2003 07:27:54 -0500
- To: www-international@w3.org
Hi Sherma/Steve/Ienup and all, At first, thanks a lot for your responses. I checked out the byte values of the characters I wrote in the .txt file in Solaris box. They are - ada1 for circled 1, adb6 for Roman 2 and adb9 for Roman 5. After doing man eucJP in Solaris I found that for JIS X 208 the 13th row (ada1 to adfe) are reserved for vendor defined characters. Also I checked the link http://h30097.www3.hp.com/docs/base_doc/DOCUMENTATION/V40F_HTML/SUPPDOCS /JAPANDOC/JAPANCH1.HTM. There I found the rows 9 - 15 of JIS X208 are for reserved characters. To add one more related observations - I found Java's EUC-JP implementation does not support the above mentioned characters. From the browser when I type the above characters and get them as a string in my JSP code the bytes come as f3 f3 (?, ?). It happens even after setting charset to EUC-JP at jsp level through the page directive and setting character encoding of the request to EUC_Jp through the API request.setCharacterEncoding. All other Japanese characters work perfectly fine for the above jsps. Now, based on your responses and the above observations what I conclude is - 1. Based on a specific vendor the set of all characters supported by EUC-JP encoding will vary - and that is due to those reserved areas in the coded character sets supported by EUC-JP (i.e. 9 to 15 and 85-94 for JIS X 208, and 3 to 5, 12 to 15 and 78 to 94 for JIS X212). 2. All other rows of the coded character sets for EUC-JP will contain same set of characters irrespective of the vendor. 3. Above two points can be extended to any other encodings. It will be really helpful for me if you can verify the above points. However, the problem is since vendors don't explicitly specify what are the characters they support in those reserved areas it becomes very difficult to handle these type of special characters when the architecture involves multiple platforms/tools. As in my case - the distributed application we are developing involves J2EE application server (Weblogic 8.1), on Solaris 8, Sybase 12.5 on Solaris 8 and Namazu search engine on Solaris 2.8. It seems implementations of EUC-JP in Solaris and Sybase support those special characters (and Namazu works based on ja locale of Solaris) but Java does not support. Any suggestion on how to handle such characters in above mentioned heterogeneous environments will be of great help. Regards, Sourav -----Original Message----- From: Xueming Shen [mailto:Xueming.Shen@Sun.COM] Sent: Wednesday, November 12, 2003 4:11 AM To: www-international@w3.org Subject: Re: [i18n-prog] RE: [Fwd: Solaris box with ja as locale supports Roman numbers, Circled numbers in Japanese strings] Steve, They are "NEC Row 13" characters which are NOT part of jisx-x-208 but supported by different vendors for "compability" reason. See man eucJP on Solaris for details. Windows also have them mapped to their sjis's Row89-92. regards, sherma Steve Billings wrote: >Ienup: > >[I think i18n-prog may be more appropriate for this discussion that >www-international; can we move this discussion to i18n-prog?] > > > >>Many Roman numerals and circled numbers are a part of JIS X 0208 >> >I don't see them in the Unicode 4.0 JIS mapping tables (that's the latest >version I happen to have at my fingertips). Do you know their Unicode or JIS >codepoints? > >When I enter circle-1 from my Windows 2000 Japanese IME (choosing the >circled-1 character from the list of choices presented for "ichi") into a >text file (notepad), and save it as Unicode, it saves the Unicode character >U+2460. This Unicode character does not appear in any of the Unicode 4.0 JIS >mappings: JIS0201.txt, JIS0208.txt, JIS0212.txt, or SHIFTJIS.txt (Unicode >4.0 CD: \Mappings\EASTASIA\JIS). (To find a mapping for it, you need to go >to \Mappings\VENDORS\Microsoft\WINDOWS\CP932.txt.) > >So when at least some software such as Oracle, for example, tries to convert >that character for storing in a Shift-JIS or EUC database, it fails to find >a mapping, and replaces it with the substitution character. > >It's certainly conceivable that some software (like, apparently, Sourav's >telnet client if he was running it on Windows) does some round-trip mapping >other than what's shown in the Unicode 4.0 tables. I'd be very interested to >learn which JIS characters are being mapped to. Sourav: can you supply the >hex value of the EUC character you find in the text file when you enter >circle-1? > >Steve > >Steve Billings >Global 360 >Software Internationalization & Localization >http://www.global360.com/ >Office: 978-266-1604 >Cell: 978-697-8201 > >-----Original Message----- >From: www-international-request@w3.org >[mailto:www-international-request@w3.org]On Behalf Of Ienup Sung >Sent: Tuesday, November 11, 2003 12:41 PM >To: www-international@w3c.org >Subject: Re: [Fwd: Solaris box with ja as locale supports Roman numbers, >Circled numbers in Japanese strings] > > >Hello, > >Many Roman numerals and circled numbers are a part of JIS X 0208 >and also a part of SJIS and so any Japanese EUC and Shift_JIS/PCK locales >will support the characters and that includes Japanese locales in Solaris. >And ISO-2022-JP also has JIS X 0208. > >With regards, > >Ienup > > >] Subject: Solaris box with ja as locale supports Roman numbers, Circled >] numbers in Japanese strings >] Resent-Date: Mon, 10 Nov 2003 06:42:58 -0500 (EST) >] Resent-From: www-international@w3.org >] Date: Mon, 10 Nov 2003 02:42:41 -0500 >] From: souravm <souravm@infosys.com> (by way of Martin Duerst >] <duerst@w3.org>) >] To: www-international@w3.org >] >] >] >] >] >] Hi Steve (and all), >] >] I'm observing something funny in Solaris box related to the issue of >] support for Roman numbers and Circled numbers in Japanese string by >EUC-JP, >] which we discussed previously. >] >] I'm having a solaris box 2.8. There I'm setting ja as locale (LANG=ja, >] LC_ALL=ja) which is supposed to be EUC-Jp equivalent in Solaris. I'm >] accessing the Solaris box from a telnet client - there also I'm setting >the >] encoding as EUC-JP. >] >] Now I'm trying to type those circled numbers and Roman numbers through the >] telnet client in - a) Command Prompt, b) In a file opened in VI editor. >] >] The observation is - I'm successfully able to type (in both command prompt >] and VI editor) and store those characters (in VI editor). >] >] Based on our previous understanding EUC-JP is not supposed to support >these >] characters. In that case I don't know how do we rationalize above >] observation. >] >] Any clue ? >] >] Regards, >] Sourav >] >] -----Original Message----- >] From: Steve Billings [mailto:billings@global360.com] >] Sent: Thursday, October 23, 2003 2:48 AM >] To: souravm; www-international@w3.org >] Subject: RE: Query on Encoding supporting Roman numbers, Circled numbers >in >] Japanese strings >] >] Those characters are non-JIS-standard characters (therefore not in >] ISO-2022-JP or EUC-JP) that exist in Microsoft CP932 (the Japanese Windows >] codepage). In other words: yes, you are correct. >] >] Steve >] >] >] Steve Billings >] Global 360 >] Software Internationalization & Localization >] http://www.global360.com/ >] Office: 978-266-1604 >] Cell: 978-697-8201 >] >] -----Original Message----- >] From: www-international-request@w3.org >] [mailto:www-international-request@w3.org]On Behalf Of souravm (by way of >] Martin Duerst <duerst@w3.org>) >] Sent: Wednesday, October 22, 2003 12:17 PM >] To: www-international@w3.org >] Subject: Query on Encoding supporting Roman numbers, Circled numbers in >] Japanese strings >] >] >] >] >] Hi All, >] >] I've a simple application which accepts Japanese string from a HTML form >] and then show the same string in the response page. >] >] Now if I enter Roman characters like I, II, etc and Circled numbers like >] $B-!!"-"(B etc as a part of Japanese string, the string is properly >shown >back >] in response page when the encoding used is UTF-8. However, the same thing >] does not work in case of EUC_JP, Shift_JIS and ISO-2022-JP as encoding. >] >] I believe these characters are not supported in EUC_JP, Shift_JIS and >] ISO-2022_jp. Can anyone please confirm it ? >] >] Regards, >] Sourav >] >] >] > > >------------------------ Yahoo! Groups Sponsor ---------------------~--> >Buy Ink Cartridges or Refill Kits for your HP, Epson, Canon or Lexmark >Printer at MyInks.com. Free s/h on orders $50 or more to the US & Canada. >http://www.c1tracking.com/l.asp?cid=5511 >http://us.click.yahoo.com/mOAaAA/3exGAA/qnsNAA/saFolB/TM >---------------------------------------------------------------------~- > > >To unsubscribe from this group, send an email to: >i18n-prog-unsubscribe@yahoogroups.com > > >Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/ > >
Received on Thursday, 13 November 2003 07:35:45 UTC