- From: KUROSAKA Teruhiko <kuro@bhlab.com>
- Date: Sat, 03 Apr 2004 20:44:30 -0800
- To: Daniel Goldschmidt <pepper@012.net.il>
- Cc: www-international@w3.org
> Entering as input the character Unicode u9b75, the browser (IE6) encodes > it to EUC-JP as FCE4 (%FC%E4 in the URL). Hmpff.. Here I have problem: > The server transcodes it to uFFFD ("REPLACEMENT CHARACTER”). I checked > it manually and got the same resultes: Java (JDK 1.4.2) does no > recognize this character, but IE6 does. I wrote manually a XML file > encoded in EUC-JP with those characters: IE6 transcoded it to u9b75 and > Java transcoded it to uFFFD. \u9b75 is not a character that can be found in EUC-JP proper. 0xFC 0xE4 is not assigned in EUC-JP, and many implementations use this and other unassigned code points as user-defined character (UDC) area. I guess IE (or Windows in general?) somehow tries to preserve this non-EUC character using the UDC area. -- KUROSAKA ("Kuro") Teruhiko San Francisco, California, USA http://www.sonic.net/~kuro/
Received on Saturday, 3 April 2004 23:49:35 UTC