Re: HTML - i18n / NCR & charsets
On Tue, 26 Nov 1996, Misha Wolf wrote:
> Indeed, ISO 8859-1 is a *strict* subset of Unicode, hence there are *no*
> differences between the two.
> Microsoft's Windows Code Page 1252 (often called Windows Latin 1) has
> characters in the range 80-9F (decimal 128-159), unlike either the ISO
> 8859-X family of standards or Unicode.
> The NBSP is at A0 (decimal 160) and so presents no problems. WCP 1252
> has a bullet at 95 (decimal 149), not (as far as I can see) at decimal
> 143. The numeric character reference • is illegal.
> Chris Wendt, from Microsoft, agreed at Seville that the use of illegal
> numeric character references was unfortunate and asked for suggestions.
> The consensus was that entity names should be used instead. As entity
> names do not (appear to) exist for most of Microsoft's extra chars, it
> was suggested that some enterprising person write them up in an RFC.
> I believe there was at least one volunteer: Chris Lilley of W3C.
I was there, but don't remember this part of the discussion.
Defining entity names for things such as "..." may not be that
bad an idea.
However, one has to be aware of a few related facts before
actually doing this:
- Using 8-bit data directly and correctly labeling the page as
being in Windows Code Page 1252 encoding is an existing
solution (as far as browsers support CP 1252, and as
far as starting to use all kinds of proprietary encodings
is not really ideal).
- Using the correct numeric character reference is also a
solution. As this uses decimal values beyond 255,
and I have not yet heard of any pages using such values
for something else than Unicode, it should not cause
compatibility problems. It works on all browsers
that support this part of the i18n spec.
- When we developed the i18n draft, we were repeatedly asked
from various parties to include more entities. This
included all kinds of areas. We decided to complete
Latin-1, but not to go beyond it to not delay our work
further. I guess if anybody starts to work on additional
character entities, (s)he won't be able to stop with
the few characters that are in CP 1252. The list may
quickly become so long as to not be feasible as a
single list, also.