Date: Tue, 26 Nov 1996 21:35:33 -0500 (EST) From: Misha Wolf <MISHA.WOLF@reuters.com> Subject: Re: HTML - i18n / NCR & charsets In-Reply-To: <Pine.SOL.3.91.961126211119.8528Afirstname.lastname@example.org> To: www-html <email@example.com>, www-international <firstname.lastname@example.org>, Unicode <email@example.com> Message-Id: <8033352126111996/A24914/RE6/11ABD5632100*@MHS> If we are considering Web pages using Windows Code Pages, in which illegal numeric character references have been used for characters in the range 80-9F (decimal 128-159) then there will be no clash with anything in Unicode as these values do not represent characters in Unicode or, for that matter, in ISO 8859-X. A permissive browser will simply map these to the expected characters. Misha --- On Tue, 26 Nov 1996, Misha Wolf wrote: > The following extract from RFC 1866, "Hypertext Markup Language - 2.0" shows > that legal numeric character references have been based on Unicode for quite > some time and certainly prior to the I18N draft. > I quite agree here, and I do acknowledge this; but I do insist on current practice beeing the problem. Doing a quick scan over all reachable pages linked in from the webdirectory (www.webdirectory.com) last night; I do find a substancial number of pages which would be broken. About 7%/4K pages. OF these about a fifth dates of before RFC1866. But *AGAIN* I acknowledge that there _should_ be no problems, people should not have relied on NCRs in the low top bit range; but they have done so. And if you have easy ways of marking your pages such that you do not break excising practice, you should do so. Dw.