Re: HTML - i18n / NCR & charsets

Dirk.vanGulik@jrc.it
Tue, 26 Nov 1996 19:52:38 +0100 (MET)


Date: Tue, 26 Nov 1996 19:52:38 +0100 (MET)
From: Dirk.vanGulik@jrc.it
To: Benjamin Franz <snowhare@netimages.com>
Cc: "Dirk.vanGulik" <Dirk.vanGulik@jrc.it>, www-html@w3.org
Subject: Re: HTML - i18n / NCR & charsets
In-Reply-To: <Pine.LNX.3.95.961126103157.7652B-100000@ns.viet.net>
Message-Id: <Pine.SOL.3.91.961126194926.8458A-100000@elect6.jrc.it>


On Tue, 26 Nov 1996, Benjamin Franz wrote:

> On Tue, 26 Nov 1996, Dirk.vanGulik wrote:
> > 
> > Some possible solutions are proposed:
> > 
> > 1. An extended Content-type header is used.
> > 	Content-type: text/html.i18n
> > 	Content-type: text/html-i18n
> > 
> > 2. An additional attribute to the charset is used
> > 	Content-type: text/html; charset=iso-8859-1; ncr=iso-104..
> > 
> > 3. An additional (level) attribute to the text/html is used.
> > 	Content-type: text/html; level=2; charset=iso8859-1
> > 	Content-type: text/html; version=2.0/i; charset=iso8859-1
> > 
> > 4. An additional DTD specifier in the HTML is insisted upon.
> > 	<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 2.0i//EN">
> > 
> > 5. An additional header is added to signal that the site 
> >    is internatialised.
> > 	Content-Quality: i18n/v1.02
> > 
> > Please note that the effect accomplished by each of the above techniques 
> > are similar; they serve to inform the receiving end about the way any
> > in-line numerical character references are to be treated.
> > 
> > Option number 1 is by far the easiest to implement; and some of
> > the deployed server and browser codes is able to tread this as
> > an 'html' resource with a 'il8n; flavouring.
> 
> No. Option 1 is by far the *most difficult* to actually deploy because
> *most* existing browsers will attempt to download the now unknown file
> type.  This is why Roy's 'Proposed Transition Strategy for the Deployment
> of Tables' never worked in practice. The other options at least don't
> break (very many) existing browsers.

You might be right here; I tried the five big ones in their last
two versions, beta and shipping. They seemed to copy. But I agree
that a 'level' type of addition o an header one is *much* safer
in that respect, and I honestly do not know what is out there
in the browser world.
 
> > If HTML-i18n is to go ahead, without any signaling about the NCRs
> > target charset change (i.e in Unicode rather than the announced
> > charset); then IMHO this should at least be mensioned in the draft
> > as it break existing, widespread, practice, which prior to this
> > i18n draft could not be signalled as 'wrong' or 'illegal'.
> 
> Hmmm...Is there actually a difference in the first 256 codes of Unicode
> and ISO8859-1? I thought they were identical over that range?
> 
There are just a few differences; mainly in the empty block which
has the funny chars such as th bullet (143) and non-breaking-space
(160) to name the popular offenders.

DW.