Re: Translated IUC10 Web pages: Experimental Results from Martin J. Duerst on 1997-02-05 (www-international@w3.org from January to March 1997)

From: Martin J. Duerst <mduerst@ifi.unizh.ch>
Date: Wed, 5 Feb 1997 17:14:38 +0100 (MET)
To: Misha Wolf <misha.wolf@reuters.com>
cc: Unicore <unicore@unicode.org>, Unicode <unicode@unicode.org>, www-international <www-international@w3.org>, Search <search@mccmedia.com>, ISO10646 <iso10646@listproc.hcf.jhu.edu>, http-wg@cuckoo.hpl.hp.com
Message-ID: <Pine.SUN.3.95q.970205170834.432E-100000@enoshima>

On Wed, 5 Feb 1997, Misha Wolf wrote:

> I think it very unlikely that plain 16-bit Unicode will be adopted by 
> browsers in the next year or two.

Why not? It is more compact for East Asia (apart from the fact that
compression can be used anyway). I might understand if you would say
that it might not be adopted by content providers. But for browsers,
supporting UCS2/UTF-16 in addition to UTF-8 is an extremely small
addition, so I don't even see why there is discussion about it.

>The two encoding schemes which will 
> be widely used to encode Unicode Web pages are:
> 
>    1.  UTF-8 (see <http://www.reuters.com/unicode/iuc10/x-utf8.html>).
>    2.  Numeric Character References (see <http://www.reuters.com/unicode/iuc10/x-ncr.html>).
> 
> The second scheme is intriguing as it does not require the use of any 
> octets over 127 decimal (7F hex).  Accordingly, it is legal to to label 
> such a file as, eg, US-ASCII, ISO-8859-1, X-SJIS, or any other "charset" 
> which has ASCII as a subset.

It is not very harmful to label such pages ISO-8859-1 or whatever.
But strictly speaking, it is not legal! If there are alternatives
for labeling, the most restrictive label should be used. If it's
labeled us-ascii, you know that it's going to pass though 7-bit
mail. Otherwise, you don't.

I don't see that much of future popularity for purely NCR-coded
documents. These are more valuable for cases where you want to
add a character or two from a script not supported in the
local encoding used, e.g. a Kanji or two to a German document
or so.

Regards,	Martin.

Received on Wednesday, 5 February 1997 11:15:32 UTC