RE: Servlet question from Shigemichi Yazawa on 2001-10-22 (www-international@w3.org from October to December 2001)

From: Shigemichi Yazawa <yazawa@globalsight.com>
Date: Mon, 22 Oct 2001 09:58:36 -0600
To: yves@realnames.com
Cc: www-international@w3.org
Message-ID: <5esncbacpf.wl@globalsight.com>

At Mon, 22 Oct 2001 00:11:19 -0700,
Yves Arrouye <yves@realnames.com> wrote:
> Isn't ISO-8859-1 actually the one that has "holes" in C0/C1 that exhibit
> this very behavior?

There is no hole in ISO-8859-1 <-> Unicode mapping table provided by
unicode.org (see
http://www.unicode.org/Public/MAPPINGS/ISO8859/8859-1.TXT). C0/C1
characters are mapped to C0/C1 characters. No undefined characters in
the table. And I believe that Java (at least Sun's implementation)
uses the same table.

> I thought that was the case, and windows-1252 was the
> one that used C1 for platform-specific character (see
> http://www-124.ibm.com/cvs/icu/charset/data/xml/windows-1252-2000.xml?rev=1.
> 1&content-type=text/x-cvsweb-markup where apparently U+0081 is mapped to
> 0x81 in windows-1252).

Is it data for ICU4C? Interesting that it doesn't agree with the table
by unicode.org (see
http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP1252.TXT).
Again Sun's java seems to use the above table. You can see it by
running a program below.

public class CharConversionTest
{
    static public void main(String[] args)
        throws Exception
    {
        byte[] str = new byte[256];
        for(int i = 0; i < str.length; i++)
        {
            str[i] = (byte)i;
        }

        String converted = new String(str, "Cp1252");
        for(int i = 0; i < converted.length(); i++)
        {
            System.out.println("0x" + Integer.toHexString(i) + " -> U+"
                               + Integer.toHexString(converted.charAt(i)));
        }
    }
}

-------------------
Shigemichi Yazawa
yazawa@globalsight.com

Received on Monday, 22 October 2001 11:43:12 UTC