- From: Chris Wendt <christw@microsoft.com>
- Date: Fri, 28 Aug 1998 10:05:57 -0700
- To: "Deke Smith" <deke@tallent.com>, <www-international@w3.org>
The difference between Windows-1252 and iso-8859-1 is that in iso-8859-1 the code points 0x80 to 0x9F are reserved. In Windows-1252 most of the 0x80 to 0x9F code points map to characters, among them the Euro currency sign at code point 0x80. All code points outside 0x80 to 0x9F are shared between iso-8859-1 and Windows-1252. Best practice is to label the document as iso-8859-1 unless it contains the characters at code points 0x80 to 0x9F. Windows-1252 IANA registration is requested with the charset registrar. The newer versions of the two leading browsers and associated email programs recognize the "Windows-1252" label. It was an oversight on my part to not register Windows-1252 originally with the other Windows-125x registrations :-( Here is a table of the code points that differ between iso-8859-1 and Windows-1252 and the Unicode character they map to: 0x80 0x20AC #EURO SIGN 0x82 0x201A #SINGLE LOW-9 QUOTATION MARK 0x83 0x0192 #LATIN SMALL LETTER F WITH HOOK 0x84 0x201E #DOUBLE LOW-9 QUOTATION MARK 0x85 0x2026 #HORIZONTAL ELLIPSIS 0x86 0x2020 #DAGGER 0x87 0x2021 #DOUBLE DAGGER 0x88 0x02C6 #MODIFIER LETTER CIRCUMFLEX ACCENT 0x89 0x2030 #PER MILLE SIGN 0x8A 0x0160 #LATIN CAPITAL LETTER S WITH CARON 0x8B 0x2039 #SINGLE LEFT-POINTING ANGLE QUOTATION MARK 0x8C 0x0152 #LATIN CAPITAL LIGATURE OE 0x8E 0x017D #LATIN CAPITAL LETTER Z WITH CARON 0x91 0x2018 #LEFT SINGLE QUOTATION MARK 0x92 0x2019 #RIGHT SINGLE QUOTATION MARK 0x93 0x201C #LEFT DOUBLE QUOTATION MARK 0x94 0x201D #RIGHT DOUBLE QUOTATION MARK 0x95 0x2022 #BULLET 0x96 0x2013 #EN DASH 0x97 0x2014 #EM DASH 0x98 0x02DC #SMALL TILDE 0x99 0x2122 #TRADE MARK SIGN 0x9A 0x0161 #LATIN SMALL LETTER S WITH CARON 0x9B 0x203A #SINGLE RIGHT-POINTING ANGLE QUOTATION MARK 0x9C 0x0153 #LATIN SMALL LIGATURE OE 0x9E 0x017E #LATIN SMALL LETTER Z WITH CARON 0x9F 0x0178 #LATIN CAPITAL LETTER Y WITH DIAERESIS You can find the complete table at ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP1252.TXT as well as definitions to the other code pages. >Does Windows 3.x/DOS use the same encoding as Windows 95/98? Roughly: yes. Note DOS and the Win9x MS-DOS prompt use the so called "OEM" code page. For European charsets these are 3-digit numbers like 437, 850, 850, 863 and so on. This is not the same as the mis-named "ANSI" code page which is exposed to Windows applications. On Asian versions the OEM and "ANSI" code pages are the same. Yes, the 1252 code page of Windows 98 has some more characters than original Windows 95 and Windows 3.1. Most importantly the Windows 98 version has a place for the Euro currency sign and positions for upper and lowercase Z with caron. Both Windows 9x and WIndows NT handle Unicode and Multibyte code page in parallel and offer a number of conversion functions. However, most of the system APIs on Win9x take only Multibyte parameters whereas NT offers both versions for all system APIs. A good overview of Win9x Unicode capabilities gives the article "Yes, Virginia, Windows 95 does Unicode" on the Microsoft Developer Network CD. -----Original Message----- From: Deke Smith <deke@tallent.com> To: www-international@w3.org <www-international@w3.org> Date: Friday, August 28, 1998 8:25 AM Subject: Windows and Mac character encoding questions >I have seen some contradictory information about the character encoding >for Windows text. > >One source said that Windows uses ISO-8859-1 for its English-language >system, then I saw a thread about the Windows-1252 encoding and how it >differs from ISO-8859-1. > >Does Windows 3.x/DOS use the same encoding as Windows 95/98? I have read >that WinNT uses Unicode, but is the default encoding under the English >language system different than the other flavors of Win/DOS? IANA lists >"Windows-1250", "Windows-1254", etc. but does not list our friend, >"Windows-1252". > >On the Mac, the English encoding is called "MacRoman" by the browsers, >news clients and email clients. IANA does not list "MacRoman" as an >encoding scheme, instead it lists, "Macintosh". Which is the acceptable >usage? > >I'm using as my IANA reference >ftp://ftp.isi.edu/in-notes/iana/assignments/character-sets > > > >Just a little confused.... > >----------------------------------------------------------------- >Deke Smith >Tallent Communications Group, Brentwood TN >deke@tallent.com, 615-661-9878 >----------------------------------------------------------------- >" The best way to predict the future is to invent it. " > - Alan Kay > >
Received on Friday, 28 August 1998 13:05:44 UTC