Re: HTML 4.0 comments

Ian B. Jacobs wrote:

> Adam M. Costello wrote:
>
> > 24.4 Character entity references for markup-significant and
> >      internationalization characters
> >
> >         Entities have also been added for the remaining characters
> >         occurring in CP-1252 which do not occur in the HTMLlat1 or
> >         HTMLsymbol entity sets. These all occur in the 128 to 159
> >         range within the cp-1252 charset.
> >
> >     What is CP-1252?  It doesn't seem to be defined or referenced
> >     anywhere.

> Good question. I hadn't noticed this before.

CP-1252 is the "Windows Latin" character set, which contains all of Latin-1
and is typically used by non-Unicode Windows programs to display Latin-1 HTML
documents.

There should be a reference in the IANA charsets registry (it is, as I recall,
a registered character set).

It also contains some additional characters, which unlike the Latin-1
ones do not map 1:1 from their code positions in CP-1252 to the code
positions in Unicode. These characters have crept into HTML documents
which were authored on Windows platforms. The characters all
correspond to some Unicode characters, and the HTML 4.0 entity
list explicitly defines which Unicode character is used for each
of these CP-1252 characters.

Received on Monday, 29 December 1997 20:20:11 UTC