W3C home > Mailing lists > Public > www-international@w3.org > April to June 2004

Re: EUC-JP encoding - Bug in IE6?

From: KUROSAKA Teruhiko <kuro@bhlab.com>
Date: Sat, 03 Apr 2004 20:44:30 -0800
Message-ID: <406F92AE.8070600@bhlab.com>
To: Daniel Goldschmidt <pepper@012.net.il>
Cc: www-international@w3.org


> Entering as input the character Unicode u9b75, the browser (IE6) encodes 
> it to EUC-JP as FCE4 (%FC%E4 in the URL). Hmpff.. Here I have problem: 
> The server transcodes it to uFFFD ("REPLACEMENT CHARACTER”). I checked 
> it manually and got the same resultes: Java (JDK 1.4.2) does no 
> recognize this character, but IE6 does. I wrote manually a XML file 
> encoded in EUC-JP with those characters: IE6 transcoded it to u9b75 and 
> Java transcoded it to uFFFD.

\u9b75 is not a character that can be found in EUC-JP proper.
0xFC 0xE4 is not assigned in EUC-JP, and many implementations use this
and other unassigned code points as user-defined character (UDC) area.
I guess IE (or Windows in general?) somehow tries to preserve this
non-EUC character using the UDC area.

-- 
KUROSAKA ("Kuro") Teruhiko
San Francisco, California, USA
http://www.sonic.net/~kuro/
Received on Saturday, 3 April 2004 23:49:35 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 2 June 2009 19:17:03 GMT