Re: Windows 1252 Mapping Tables (was: Servlet question

A much better policy, as outlined in the CharMapML, is to either explicitly
mark them as unmapped or to map to the private use zone. Mapping to
"themselves" cannot be distinguished from real mappings.

Mark
—————

Δός μοι ποῦ στῶ, καὶ κινῶ τὴν γῆν — Ἀρχιμήδης
[http://www.macchiato.com]

----- Original Message -----
From: "Murray Sargent" <murrays@microsoft.com>
To: "Merle Tenney" <Merle.Tenney@corp.palm.com>
Cc: <unicore@unicode.org>; <www-international@w3.org>; "Mark Davis"
<mark@macchiato.com>; "Yves Arrouye" <yves@realnames.com>; "Shigemichi
Yazawa" <yazawa@globalsight.com>
Sent: Tuesday, October 23, 2001 14:15
Subject: RE: Windows 1252 Mapping Tables (was: Servlet question


> Yes. The idea is that if Microsoft published the undefined 125x
> character mappings, one might infer that such undefined characters are
> actually defined. Maybe a better way to document it would be to say that
> an undefined codepoint maps to itself for the purpose of roundtripping
> through Unicode. Conceivably, undefined codepoints may be defined in the
> future for exceedingly important special characters, e.g., the Euro,
> which prompted the last additions to the 125x code pages in June, 1998.
> My guess is that this is highly unlikely though, since current Microsoft
> OSs and products are firmly based on Unicode. Note that Mark Davis and
> colleagues have done extensive testing on Windows code-page mappings as
> well as on those on other systems. You might want to contact him
> privately for more info. You can see Microsoft's defined mappings at
> http://www.microsoft.com/globaldev/reference/WinCP.asp.
>
> Thanks
> Murray
>
> > -----Original Message-----
> > From: Merle Tenney [SMTP:Merle.Tenney@corp.palm.com]
> > Sent: Tuesday, October 23, 2001 1:16 PM
> > To: Murray Sargent; Merle Tenney; Yves Arrouye; Shigemichi Yazawa
> > Cc: unicore@unicode.org; www-international@w3.org
> > Subject: RE: Windows 1252 Mapping Tables  (was: Servlet question
> >
> > Murray,
> >
> > > Windows round trips undefined 125x characters in the range
> > > 0x80 - 0x9F by leaving their values unchanged. So in 1252,
> > > the undefined codepoint 0x81 maps to 0x81 and back. On the
> > > other hand, 0x80 is defined to be the EURO in 1252, so it
> > > maps to the corresponding Unicode value 0x20AC.
> >
> > And so I presume that the round-trip mappings of the undefined 1252
> > characters in the C1 range are the only differences between the tables
> > published by Unicode and those used in Windows itself?
> >
> > Merle
>
>

Received on Tuesday, 23 October 2001 21:59:43 UTC