Re: Servlet question from Shigemichi Yazawa on 2001-10-19 (www-international@w3.org from October to December 2001)

From: Shigemichi Yazawa <yazawa@globalsight.com>
Date: Fri, 19 Oct 2001 11:01:45 -0600
To: www-international@w3.org
Message-ID: <5eadynef7q.wl@globalsight.com>

At Fri, 19 Oct 2001 15:29:24 +0200,
Thierry Sourbier <webmaster@i18ngurus.com> wrote:
> Well it is a case where 2 mistakes compensate one another :). You are
> relying on the default encoding for both the input and output when your data
> obviously is using a different encoding. This works fine only as your
> default encoding is likely a single byte with no invalid values (e.g.
> CP1252).

Yes, two wrong conversions make a right result, However, Cp1252
doesn't always work this way. Cp1252 <-> Unicode mapping table
includes 5 undefined entries. If you pass 0x81, for example, to byte
to char converter, it is converted to U+fffd (REPLACEMENT CHARACTER)
and the round trip is not possible. Only ISO-8859-1 is the safe, round
trippable encoding as far as I know.

-------------------
Shigemichi Yazawa
yazawa@globalsight.com

Received on Friday, 19 October 2001 12:46:21 UTC