Re: HTML - i18n / NCR & charsets

Jonathan Rosenne (rosenne@NetVision.net.il)
Tue, 26 Nov 1996 23:14:05 +0200


Message-Id: <1.5.4.32.19961126211405.006763c4@mail.netvision.net.il>
Date: Tue, 26 Nov 1996 23:14:05 +0200
To: Dirk.vanGulik@jrc.it
From: Jonathan Rosenne <rosenne@NetVision.net.il>
Subject: Re: HTML - i18n / NCR & charsets
Cc: Misha Wolf <MISHA.WOLF@reuters.com>, www-html <www-html@w3.org>

At 21:16 26/11/96 +0100, Dirk.vanGulik@jrc.it wrote:
>
>On Tue, 26 Nov 1996, Misha Wolf wrote:
>
>> The following extract from RFC 1866, "Hypertext Markup Language - 2.0" shows 
>> that legal numeric character references have been based on Unicode for quite 
>> some time and certainly prior to the I18N draft.
>> 
>I quite agree here, and I do acknowledge this; but I do insist on current
>practice beeing the problem. Doing a quick scan over all reachable pages 
>linked
>in from the webdirectory (www.webdirectory.com) last night; I do find a
>substancial number of pages which would be broken. About 7%/4K pages. OF
>these about a fifth dates of before RFC1866.
>
>But *AGAIN* I acknowledge that there _should_ be no problems, people
>should not have relied on NCRs in the low top bit range; but they have 
>done so. And if you have easy ways of marking your pages such that you do
>not break excising practice, you should do so.

As with all "illegal" HTML, the browser writers will have to make some
sensible guess according to their individual inclinations. The nature of the
NCR and the HTTP charset (if available in the HTTP header or in the META) or
the country in the domain name may provide useful hints.

In any case, it isn't a good idea to specify in the HTML specification how
to handle invalid HTML more than is specified today.

--

Jonathan Rosenne
JR Consulting
P O Box 33641, Tel Aviv, Israel
Phone: +972 50 246 522 Fax: +972 9 956 7353
http://ourworld.compuserve.com/homepages/Jonathan_Rosenne/