RE: Servlet question from Yves Arrouye on 2001-10-22 (www-international@w3.org from October to December 2001)

From: Yves Arrouye <yves@realnames.com>
Date: Mon, 22 Oct 2001 00:11:19 -0700
To: "'Shigemichi Yazawa'" <yazawa@globalsight.com>, www-international@w3.org
Message-ID: <7FC3066C236FD511BC5900508BAC86FE4D7DB3@trestles.internal.realnames.com>

> Yes, two wrong conversions make a right result, However, Cp1252
> doesn't always work this way. Cp1252 <-> Unicode mapping table
> includes 5 undefined entries. If you pass 0x81, for example, to byte
> to char converter, it is converted to U+fffd (REPLACEMENT CHARACTER)
> and the round trip is not possible. Only ISO-8859-1 is the safe, round
> trippable encoding as far as I know.

Isn't ISO-8859-1 actually the one that has "holes" in C0/C1 that exhibit
this very behavior? I thought that was the case, and windows-1252 was the
one that used C1 for platform-specific character (see
http://www-124.ibm.com/cvs/icu/charset/data/xml/windows-1252-2000.xml?rev=1.
1&content-type=text/x-cvsweb-markup where apparently U+0081 is mapped to
0x81 in windows-1252).

YA

Received on Monday, 22 October 2001 03:15:37 UTC