Re: HTML - i18n / NCR & charsets from Misha Wolf on 1996-11-27 (www-international@w3.org from October to December 1996)

From: Misha Wolf <MISHA.WOLF@reuters.com>
Date: Tue, 26 Nov 1996 21:35:33 -0500 (EST)
To: www-html <www-html@w3.org>, www-international <www-international@w3.org>, Unicode <unicode@unicode.org>
Message-Id: <8033352126111996/A24914/RE6/11ABD5632100*@MHS>

If we are considering Web pages using Windows Code Pages, in which 
illegal numeric character references have been used for characters 
in the range 80-9F (decimal 128-159) then there will be no clash 
with anything in Unicode as these values do not represent characters 
in Unicode or, for that matter, in ISO 8859-X.  A permissive browser 
will simply map these to the expected characters.

Misha

---

On Tue, 26 Nov 1996, Misha Wolf wrote:

> The following extract from RFC 1866, "Hypertext Markup Language - 2.0" shows 
> that legal numeric character references have been based on Unicode for quite 
> some time and certainly prior to the I18N draft.
> 
I quite agree here, and I do acknowledge this; but I do insist on current
practice beeing the problem. Doing a quick scan over all reachable pages 
linked
in from the webdirectory (www.webdirectory.com) last night; I do find a
substancial number of pages which would be broken. About 7%/4K pages. OF
these about a fifth dates of before RFC1866.

But *AGAIN* I acknowledge that there _should_ be no problems, people
should not have relied on NCRs in the low top bit range; but they have 
done so. And if you have easy ways of marking your pages such that you do
not break excising practice, you should do so.

Dw.

Received on Tuesday, 26 November 1996 16:37:44 UTC