W3C home > Mailing lists > Public > www-international@w3.org > October to December 2001

RE: Windows 1252 Mapping Tables (was: Servlet question

From: Murray Sargent <murrays@microsoft.com>
Date: Tue, 23 Oct 2001 14:15:53 -0700
Message-ID: <3BAF945D6708914BA399162E1D7AD1D10253ABB4@red-msg-02.redmond.corp.microsoft.com>
To: "Merle Tenney" <Merle.Tenney@corp.palm.com>
Cc: <unicore@unicode.org>, <www-international@w3.org>, "Mark Davis" <mark@macchiato.com>, "Yves Arrouye" <yves@realnames.com>, "Shigemichi Yazawa" <yazawa@globalsight.com>
Yes. The idea is that if Microsoft published the undefined 125x
character mappings, one might infer that such undefined characters are
actually defined. Maybe a better way to document it would be to say that
an undefined codepoint maps to itself for the purpose of roundtripping
through Unicode. Conceivably, undefined codepoints may be defined in the
future for exceedingly important special characters, e.g., the Euro,
which prompted the last additions to the 125x code pages in June, 1998.
My guess is that this is highly unlikely though, since current Microsoft
OSs and products are firmly based on Unicode. Note that Mark Davis and
colleagues have done extensive testing on Windows code-page mappings as
well as on those on other systems. You might want to contact him
privately for more info. You can see Microsoft's defined mappings at
http://www.microsoft.com/globaldev/reference/WinCP.asp.

Thanks
Murray

> -----Original Message-----
> From:	Merle Tenney [SMTP:Merle.Tenney@corp.palm.com]
> Sent:	Tuesday, October 23, 2001 1:16 PM
> To:	Murray Sargent; Merle Tenney; Yves Arrouye; Shigemichi Yazawa
> Cc:	unicore@unicode.org; www-international@w3.org
> Subject:	RE: Windows 1252 Mapping Tables  (was: Servlet question
> 
> Murray,
> 
> > Windows round trips undefined 125x characters in the range 
> > 0x80 - 0x9F by leaving their values unchanged. So in 1252, 
> > the undefined codepoint 0x81 maps to 0x81 and back. On the 
> > other hand, 0x80 is defined to be the EURO in 1252, so it 
> > maps to the corresponding Unicode value 0x20AC.
> 
> And so I presume that the round-trip mappings of the undefined 1252
> characters in the C1 range are the only differences between the tables
> published by Unicode and those used in Windows itself?
> 
> Merle
Received on Tuesday, 23 October 2001 17:16:55 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 2 June 2009 19:16:58 GMT