- From: Shigemichi Yazawa <yazawa@globalsight.com>
- Date: Mon, 22 Oct 2001 09:58:36 -0600
- To: yves@realnames.com
- Cc: www-international@w3.org
At Mon, 22 Oct 2001 00:11:19 -0700, Yves Arrouye <yves@realnames.com> wrote: > Isn't ISO-8859-1 actually the one that has "holes" in C0/C1 that exhibit > this very behavior? There is no hole in ISO-8859-1 <-> Unicode mapping table provided by unicode.org (see http://www.unicode.org/Public/MAPPINGS/ISO8859/8859-1.TXT). C0/C1 characters are mapped to C0/C1 characters. No undefined characters in the table. And I believe that Java (at least Sun's implementation) uses the same table. > I thought that was the case, and windows-1252 was the > one that used C1 for platform-specific character (see > http://www-124.ibm.com/cvs/icu/charset/data/xml/windows-1252-2000.xml?rev=1. > 1&content-type=text/x-cvsweb-markup where apparently U+0081 is mapped to > 0x81 in windows-1252). Is it data for ICU4C? Interesting that it doesn't agree with the table by unicode.org (see http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP1252.TXT). Again Sun's java seems to use the above table. You can see it by running a program below. public class CharConversionTest { static public void main(String[] args) throws Exception { byte[] str = new byte[256]; for(int i = 0; i < str.length; i++) { str[i] = (byte)i; } String converted = new String(str, "Cp1252"); for(int i = 0; i < converted.length(); i++) { System.out.println("0x" + Integer.toHexString(i) + " -> U+" + Integer.toHexString(converted.charAt(i))); } } } ------------------- Shigemichi Yazawa yazawa@globalsight.com
Received on Monday, 22 October 2001 11:43:12 UTC