RE: Problem in showing Japanise Wave dash from Suzanne M. Topping on 2002-11-08 (www-international@w3.org from October to December 2002)

From: Suzanne M. Topping <stopping@bizwonk.com>
Date: Fri, 8 Nov 2002 13:16:10 -0500
To: <www-international@w3.org>
Message-ID: <427F53DA8F48E9498ADF0F868763F88C0F6A9A@wonkserver1.bizwonk.com>
> -----Original Message-----
> From: souravm [mailto:souravm@infosys.com] 

> 
> I'm facing some problem in displaying a Japanese character, 
> WAVE DASH (縲

The wave character was discussed earliear this year on the Unicode and NELOCSIG lists. Here is the first of two notes from the Unicode mail archive which you may find useful:


From: Kenneth Whistler (kenw@sybase.com)
Date: Wed Feb 20 2002 - 14:23:16 EST 

Previous message: Marco Cimarosti: "RE: Unicode Search Engines" 
Maybe in reply to: Suzanne M. Topping: "RE: [nelocsig] Japanese wave character issue" 
Next in thread: David Hopwood: "Re: [nelocsig] Japanese wave character issue" 
Reply: David Hopwood: "Re: [nelocsig] Japanese wave character issue" 
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ] 
Mail actions: [ respond to this message ] [ mail a new topic ] 

--------------------------------------------------------------------------------

Yep, that's right. This is one of the notorious small list of 
inconsistencies between various mappings of JIS X 0208: 


Microsoft Code Page 932 mapping: 

0x8160 0xFF5E #FULLWIDTH TILDE 

Alternative JIS X 0208 Shift-JIS mapping (e.g. for the Mac): 

0x8160 0x2141 0x301C # WAVE DASH 

Actually, the Unicode Consortium does not take (as yet) a formal 
position on which of these conversions is correct. Mapping tables 
are simply supplied by various vendors, and there may be 
inconsistencies in their interpretations of mappings. 

My *personal* opinion is that Microsoft has it right, as SJIS 
0x8160 is treated as a fullwidth tilde in Japan, and is 
generally shown that way in widely available commercial fonts. 

When databases are doing roundtrip conversions through Unicode, 
they need to be aware of these exceptional cases in the conversions, 
precisely to avoid the kind of data corruption you are encountering. 
There is no simple, universal "fix" for this, since platforms 
do the conversions that they do, and other applications need to 
take into account the edge cases. 

The UTC has suggested an approach of documenting all the known 
issues, particularly for Shift-JIS mappings, the most problematical 
of the lot, but as yet no particular progress has been made on 
this suggestion. 

--Ken 

> The note below came through the NELOCSIG list, but I'm assuming someone 
> on this list may be able to give Laura some suggestions. 
> 
> -----Original Message----- 
> From: Nelson, Laura [mailto:lnelson@kenan.com] 
> Sent: Wednesday, February 20, 2002 1:04 PM 
> To: 'nelocsig@yahoogroups.com' 
> Subject: [nelocsig] Japanese wave character issue 
> 
> 
> 
> We have a situation where an important character, the Japanese "wave 
> character", is lost during transfers from various parts of our software. 
> The root cause is that Windows uses a different encoding than does the 
> rest of the world. 
> 
> Data is entered into our database by one program which uses the more 
> standard conversion to UTF8, and then read by another program using the 
> Windows version. It displays as garbage, because the wave character gets 
> lost in the conversion. 
> 
> There are other potential conversion issues with the same character, 
> because it is non-standard. 
> Does anyone have any suggestions? 
> The encodings in question are: 
> U+FF5E used by Windows 
> U+30-1C used by JIS X 0221, Unicode Consortium, Java (SJIS, EUCJIS, and 
> JIS), and Mac. 
> The SHIFT-JIS character is 0x8160
Received on Friday, 8 November 2002 13:16:12 UTC