Re: HTML - i18n / NCR & charsets

Martin J. Duerst (mduerst@ifi.unizh.ch)
Thu, 28 Nov 1996 10:49:46 +0100 (MET)


Date: Thu, 28 Nov 1996 10:49:46 +0100 (MET)
From: "Martin J. Duerst" <mduerst@ifi.unizh.ch>
To: Keld J|rn Simonsen <keld@dkuug.dk>
cc: Misha Wolf <MISHA.WOLF@reuters.com>, www-html <www-html@w3.org>,
Subject: Re: HTML - i18n / NCR & charsets
In-Reply-To: <199611270107.CAA29978@dkuug.dk>
Message-ID: <Pine.SUN.3.95.961128104417.1006A-100000@enoshima>

On Wed, 27 Nov 1996, Keld J|rn Simonsen wrote:

> Misha Wolf writes:
> 
> > If we are considering Web pages using Windows Code Pages, in which 
> > illegal numeric character references have been used for characters 
> > in the range 80-9F (decimal 128-159) then there will be no clash 
> > with anything in Unicode as these values do not represent characters 
> > in Unicode or, for that matter, in ISO 8859-X.  A permissive browser 
> > will simply map these to the expected characters.
> 
> I just checked, the AMD 3 to 10646 says that C1 is reserved
> for control characters, and thus it cannot be used for graphic
> characters like in CP1251

The HTML DTDs at least since 2.0 officially disallow characters in
this range. So e.g. &#128; is illegal in HTML even if it is defined
in AMD 3. It is therefore possible for a *permissive* browser to
use some guessing to cope with these illegal (in HTML) values.

Regards,	Martin.