- From: KUROSAKA Teruhiko <kuro@bhlab.com>
- Date: Sat, 03 Apr 2004 20:44:30 -0800
- To: Daniel Goldschmidt <pepper@012.net.il>
- Cc: www-international@w3.org
> Entering as input the character Unicode u9b75, the browser (IE6) encodes
> it to EUC-JP as FCE4 (%FC%E4 in the URL). Hmpff.. Here I have problem:
> The server transcodes it to uFFFD ("REPLACEMENT CHARACTER”). I checked
> it manually and got the same resultes: Java (JDK 1.4.2) does no
> recognize this character, but IE6 does. I wrote manually a XML file
> encoded in EUC-JP with those characters: IE6 transcoded it to u9b75 and
> Java transcoded it to uFFFD.
\u9b75 is not a character that can be found in EUC-JP proper.
0xFC 0xE4 is not assigned in EUC-JP, and many implementations use this
and other unassigned code points as user-defined character (UDC) area.
I guess IE (or Windows in general?) somehow tries to preserve this
non-EUC character using the UDC area.
--
KUROSAKA ("Kuro") Teruhiko
San Francisco, California, USA
http://www.sonic.net/~kuro/
Received on Saturday, 3 April 2004 23:49:35 UTC