RE: [i18n-prog] RE: [Fwd: Solaris box with ja as locale supports Roman numbers, Circled numbers in Japanese strings] from souravm on 2003-11-13 (www-international@w3.org from October to December 2003)

From: souravm <souravm@infosys.com>
Date: Thu, 13 Nov 2003 07:27:54 -0500
To: www-international@w3.org
Message-Id: <4.2.0.58.J.20031113072744.05ad8220@localhost>
Hi Sherma/Steve/Ienup and all,

At first, thanks a lot for your responses.

I checked out the byte values of the characters I wrote in the .txt file
in Solaris box. They are -
	ada1 for circled 1,
	adb6 for Roman 2 and
	adb9 for Roman 5.
After doing man eucJP in Solaris I found that for JIS X 208 the 13th row
(ada1 to adfe) are reserved for vendor defined characters.

Also I checked the link
http://h30097.www3.hp.com/docs/base_doc/DOCUMENTATION/V40F_HTML/SUPPDOCS
/JAPANDOC/JAPANCH1.HTM. There I found the rows 9 - 15 of JIS X208 are
for reserved characters.

To add one more related observations - I found Java's EUC-JP
implementation does not support the above mentioned characters. From the
browser when I type the above characters and get them as a string in my
JSP code the bytes come as f3 f3 (?, ?). It happens even after setting
charset to EUC-JP at jsp level through the page directive and setting
character encoding of the request to EUC_Jp through the API
request.setCharacterEncoding. All other Japanese characters work
perfectly fine for the above jsps.

Now, based on your responses and the above observations what I conclude
is -
1. Based on a specific vendor the set of all characters supported by
EUC-JP encoding will vary - and that is due to those reserved areas in
the coded character sets supported by EUC-JP (i.e. 9 to 15 and 85-94 for
JIS X 208, and 3 to 5, 12 to 15 and 78 to 94 for JIS X212).
2. All other rows of the coded character sets for EUC-JP will contain
same set of characters irrespective of the vendor.
3. Above two points can be extended to any other encodings.

It will be really helpful for me if you can verify the above points.

However, the problem is since vendors don't explicitly specify what are
the characters they support in those reserved areas it becomes very
difficult to handle these type of special characters when the
architecture involves multiple platforms/tools. As in my case - the
distributed application we are developing involves J2EE application
server (Weblogic 8.1), on Solaris 8, Sybase 12.5 on Solaris 8 and Namazu
search engine on Solaris 2.8. It seems implementations of EUC-JP in
Solaris and Sybase support those special characters (and Namazu works
based on ja locale of Solaris) but Java does not support. Any suggestion
on how to handle such characters in above mentioned heterogeneous
environments will be of great help.

Regards,
Sourav


-----Original Message-----
From: Xueming Shen [mailto:Xueming.Shen@Sun.COM]
Sent: Wednesday, November 12, 2003 4:11 AM
To: www-international@w3.org
Subject: Re: [i18n-prog] RE: [Fwd: Solaris box with ja as locale
supports Roman numbers, Circled numbers in Japanese strings]





Steve,

They are "NEC Row 13" characters which are NOT part of jisx-x-208 but
supported by
different vendors for "compability" reason. See man eucJP on  Solaris
for details. Windows
also have them mapped to their sjis's Row89-92.

regards,

sherma


Steve Billings wrote:

 >Ienup:
 >
 >[I think i18n-prog may be more appropriate for this discussion that
 >www-international; can we move this discussion to i18n-prog?]
 >
 >
 >
 >>Many Roman numerals and circled numbers are a part of JIS X 0208
 >>
 >I don't see them in the Unicode 4.0 JIS mapping tables (that's the
latest
 >version I happen to have at my fingertips). Do you know their Unicode
or JIS
 >codepoints?
 >
 >When I enter circle-1 from my Windows 2000 Japanese IME (choosing the
 >circled-1 character from the list of choices presented for "ichi") into
a
 >text file (notepad), and save it as Unicode, it saves the Unicode
character
 >U+2460. This Unicode character does not appear in any of the Unicode
4.0 JIS
 >mappings: JIS0201.txt, JIS0208.txt, JIS0212.txt, or SHIFTJIS.txt
(Unicode
 >4.0 CD: \Mappings\EASTASIA\JIS). (To find a mapping for it, you need to
go
 >to \Mappings\VENDORS\Microsoft\WINDOWS\CP932.txt.)
 >
 >So when at least some software such as Oracle, for example, tries to
convert
 >that character for storing in a Shift-JIS or EUC database, it fails to
find
 >a mapping, and replaces it with the substitution character.
 >
 >It's certainly conceivable that some software (like, apparently,
Sourav's
 >telnet client if he was running it on Windows) does some round-trip
mapping
 >other than what's shown in the Unicode 4.0 tables. I'd be very
interested to
 >learn which JIS characters are being mapped to. Sourav: can you supply
the
 >hex value of the EUC character you find in the text file when you enter
 >circle-1?
 >
 >Steve
 >
 >Steve Billings
 >Global 360
 >Software Internationalization & Localization
 >http://www.global360.com/
 >Office: 978-266-1604
 >Cell:    978-697-8201
 >
 >-----Original Message-----
 >From: www-international-request@w3.org
 >[mailto:www-international-request@w3.org]On Behalf Of Ienup Sung
 >Sent: Tuesday, November 11, 2003 12:41 PM
 >To: www-international@w3c.org
 >Subject: Re: [Fwd: Solaris box with ja as locale supports Roman
numbers,
 >Circled numbers in Japanese strings]
 >
 >
 >Hello,
 >
 >Many Roman numerals and circled numbers are a part of JIS X 0208
 >and also a part of SJIS and so any Japanese EUC and Shift_JIS/PCK
locales
 >will support the characters and that includes Japanese locales in
Solaris.
 >And ISO-2022-JP also has JIS X 0208.
 >
 >With regards,
 >
 >Ienup
 >
 >
 >] Subject: Solaris box with ja as locale supports   Roman numbers,
Circled
 >] numbers in Japanese strings
 >] Resent-Date: Mon, 10 Nov 2003 06:42:58 -0500 (EST)
 >] Resent-From: www-international@w3.org
 >] Date: Mon, 10 Nov 2003 02:42:41 -0500
 >] From: souravm <souravm@infosys.com> (by way of Martin Duerst
 >] <duerst@w3.org>)
 >] To: www-international@w3.org
 >]
 >]
 >]
 >]
 >]
 >] Hi Steve (and all),
 >]
 >] I'm observing something funny in Solaris box related to the issue of
 >] support for Roman numbers and Circled numbers in Japanese string by
 >EUC-JP,
 >] which we discussed previously.
 >]
 >] I'm having a solaris box 2.8. There I'm setting ja as locale
(LANG=ja,
 >] LC_ALL=ja) which is supposed to be EUC-Jp equivalent in Solaris. I'm
 >] accessing the Solaris box from a telnet client - there also I'm
setting
 >the
 >] encoding as EUC-JP.
 >]
 >] Now I'm trying to type those circled numbers and Roman numbers
through the
 >] telnet client in - a) Command Prompt, b) In a file opened in VI
editor.
 >]
 >] The observation is - I'm successfully able to type (in both command
prompt
 >] and VI editor) and store those characters (in VI editor).
 >]
 >] Based on our previous understanding EUC-JP is not supposed to support
 >these
 >] characters. In that case I don't know how do we rationalize above
 >] observation.
 >]
 >] Any clue ?
 >]
 >] Regards,
 >] Sourav
 >]
 >] -----Original Message-----
 >] From: Steve Billings [mailto:billings@global360.com]
 >] Sent: Thursday, October 23, 2003 2:48 AM
 >] To: souravm; www-international@w3.org
 >] Subject: RE: Query on Encoding supporting Roman numbers, Circled
numbers
 >in
 >] Japanese strings
 >]
 >] Those characters are non-JIS-standard characters (therefore not in
 >] ISO-2022-JP or EUC-JP) that exist in Microsoft CP932 (the Japanese
Windows
 >] codepage). In other words: yes, you are correct.
 >]
 >] Steve
 >]
 >]
 >] Steve Billings
 >] Global 360
 >] Software Internationalization & Localization
 >] http://www.global360.com/
 >] Office: 978-266-1604
 >] Cell:    978-697-8201
 >]
 >] -----Original Message-----
 >] From: www-international-request@w3.org
 >] [mailto:www-international-request@w3.org]On Behalf Of souravm (by way
of
 >] Martin Duerst <duerst@w3.org>)
 >] Sent: Wednesday, October 22, 2003 12:17 PM
 >] To: www-international@w3.org
 >] Subject: Query on Encoding supporting Roman numbers, Circled numbers
in
 >] Japanese strings
 >]
 >]
 >]
 >]
 >] Hi All,
 >]
 >] I've a simple application which accepts Japanese string from a HTML
form
 >] and then show the same string in the response page.
 >]
 >] Now if I enter Roman characters like I, II, etc and Circled numbers
like
 >] $B-!!"-"(B etc as a part of Japanese string, the string is properly
 >shown
 >back
 >] in response page when the encoding used is UTF-8. However, the same
thing
 >] does not work in case of EUC_JP, Shift_JIS and ISO-2022-JP as
encoding.
 >]
 >] I believe these characters are not supported in EUC_JP, Shift_JIS and
 >] ISO-2022_jp. Can anyone please confirm it ?
 >]
 >] Regards,
 >] Sourav
 >]
 >]
 >]
 >
 >
 >------------------------ Yahoo! Groups Sponsor
---------------------~-->
 >Buy Ink Cartridges or Refill Kits for your HP, Epson, Canon or Lexmark
 >Printer at MyInks.com. Free s/h on orders $50 or more to the US &
Canada.
 >http://www.c1tracking.com/l.asp?cid=5511
 >http://us.click.yahoo.com/mOAaAA/3exGAA/qnsNAA/saFolB/TM
 >---------------------------------------------------------------------~-
 >
 >
 >To unsubscribe from this group, send an email to:
 >i18n-prog-unsubscribe@yahoogroups.com
 >
 >
 >Your use of Yahoo! Groups is subject to
http://docs.yahoo.com/info/terms/
 >
 >
Received on Thursday, 13 November 2003 07:35:45 UTC