Re: HTML - i18n / NCR & charsets
To: www-html <firstname.lastname@example.org>, www-international <email@example.com>, Unicode <firstname.lastname@example.org>
Subject: Re: HTML - i18n / NCR & charsets
From: Misha Wolf <MISHA.WOLF@reuters.com>
Date: Tue, 26 Nov 1996 21:35:33 -0500 (EST)
From email@example.com Tue Nov 26 16: 37:44 1996
Mr-Received: by mta REC.MUAS; Relayed; Tue, 26 Nov 1996 21:35:33 -0500
Mr-Received: by mta RE6; Relayed; Tue, 26 Nov 1996 21:35:34 -0500
Mr-Received: by mta RITIG4; Relayed; Tue, 26 Nov 1996 21:36:12 -0500
If we are considering Web pages using Windows Code Pages, in which
illegal numeric character references have been used for characters
in the range 80-9F (decimal 128-159) then there will be no clash
with anything in Unicode as these values do not represent characters
in Unicode or, for that matter, in ISO 8859-X. A permissive browser
will simply map these to the expected characters.
On Tue, 26 Nov 1996, Misha Wolf wrote:
> The following extract from RFC 1866, "Hypertext Markup Language - 2.0" shows
> that legal numeric character references have been based on Unicode for quite
> some time and certainly prior to the I18N draft.
I quite agree here, and I do acknowledge this; but I do insist on current
practice beeing the problem. Doing a quick scan over all reachable pages
in from the webdirectory (www.webdirectory.com) last night; I do find a
substancial number of pages which would be broken. About 7%/4K pages. OF
these about a fifth dates of before RFC1866.
But *AGAIN* I acknowledge that there _should_ be no problems, people
should not have relied on NCRs in the low top bit range; but they have
done so. And if you have easy ways of marking your pages such that you do
not break excising practice, you should do so.