- From: Chris Lilley <chris@w3.org>
- Date: Tue, 30 Dec 1997 02:18:49 +0100
- To: "Ian B. Jacobs" <ij@w3.org>
- CC: "Adam M. Costello" <amc@cs.berkeley.edu>, www-html-editor@w3.org, w3c-html-wg@w3.org
Ian B. Jacobs wrote: > Adam M. Costello wrote: > > > 24.4 Character entity references for markup-significant and > > internationalization characters > > > > Entities have also been added for the remaining characters > > occurring in CP-1252 which do not occur in the HTMLlat1 or > > HTMLsymbol entity sets. These all occur in the 128 to 159 > > range within the cp-1252 charset. > > > > What is CP-1252? It doesn't seem to be defined or referenced > > anywhere. > Good question. I hadn't noticed this before. CP-1252 is the "Windows Latin" character set, which contains all of Latin-1 and is typically used by non-Unicode Windows programs to display Latin-1 HTML documents. There should be a reference in the IANA charsets registry (it is, as I recall, a registered character set). It also contains some additional characters, which unlike the Latin-1 ones do not map 1:1 from their code positions in CP-1252 to the code positions in Unicode. These characters have crept into HTML documents which were authored on Windows platforms. The characters all correspond to some Unicode characters, and the HTML 4.0 entity list explicitly defines which Unicode character is used for each of these CP-1252 characters.
Received on Monday, 29 December 1997 20:20:11 UTC