Re: HTML - i18n / NCR & charsets

David Perrell (davidp@earthlink.net)
Tue, 3 Dec 1996 00:51:51 -0800


Message-Id: <199612030858.AAA28045@austria.it.earthlink.net>
From: "David Perrell" <davidp@earthlink.net>
To: "Walter Torres" <walter-t@msn.com>, <www-html@w3.org>
Subject: Re: HTML - i18n / NCR & charsets
Date: Tue, 3 Dec 1996 00:51:51 -0800

Walter Torres wrote:
> Values 128-159 are not assigned to displayable characters in the
ISO8859-1 
> code and should not be used for displayable characters in HTML.
> 
> It looks like there are 2 of theis range defined as above.
> 
> What am I missing here.

The discussion began about character sets in which the ISO8859 and
Unicode character codes -- which as you note do not include #128-159 --
are mapped into #128-159 for display on systems with 8-bit character
sets, and then specified with those 'illegal' numeric references in
HTML. For example, both Mac and Windows map characters into 128-159 for
internal use, but not the same characters. An HTML author in Win might
specify &147; for left double quote and if a browser on the Mac is
aware that the HTML uses the Win character mapping it can substitute
the Mac code for left double quote. In the new internationalized HTML,
numeric codes refer to Unicode. So authors are going to have to change
those system-specific numeric character references. IMO, this is a good
thing.

Entity names are a system-independent alternative to numeric codes. I
listed ISO 8879 entity names for some #128-159 Windows mappings for
Windows Code Page 1252* (Latin 1). Unfortunately, most of those names
are not recognized by browsers.

BTW, actual character codes in Windows Code Pages are 16-bit Unicode,
so I don't quite see how CP1252 relates to the problem. Win95 supports
Unicode to some degree and WinNT actually uses it internally.

David Perrell

* <http://www.microsoft.com/truetype/unicode/1252.htm>