W3C home > Mailing lists > Public > ietf-http-wg@w3.org > January to March 1997

Re: Translated IUC10 Web pages: Experimental Results

From: Martin J. Duerst <mduerst@ifi.unizh.ch>
Date: Wed, 5 Feb 1997 17:14:38 +0100 (MET)
To: Misha Wolf <misha.wolf@reuters.com>
Cc: Unicore <unicore@unicode.org>, Unicode <unicode@unicode.org>, www-international <www-international@w3.org>, Search <search@mccmedia.com>, ISO10646 <iso10646@listproc.hcf.jhu.edu>, http-wg%cuckoo.hpl.hp.com@hplb.hpl.hp.com
Message-Id: <Pine.SUN.3.95q.970205170834.432E-100000@enoshima>
X-Mailing-List: <http-wg@cuckoo.hpl.hp.com> archive/latest/2332
On Wed, 5 Feb 1997, Misha Wolf wrote:

> I think it very unlikely that plain 16-bit Unicode will be adopted by 
> browsers in the next year or two.

Why not? It is more compact for East Asia (apart from the fact that
compression can be used anyway). I might understand if you would say
that it might not be adopted by content providers. But for browsers,
supporting UCS2/UTF-16 in addition to UTF-8 is an extremely small
addition, so I don't even see why there is discussion about it.


>The two encoding schemes which will 
> be widely used to encode Unicode Web pages are:
> 
>    1.  UTF-8 (see <http://www.reuters.com/unicode/iuc10/x-utf8.html>).
>    2.  Numeric Character References (see <http://www.reuters.com/unicode/iuc10/x-ncr.html>).
> 
> The second scheme is intriguing as it does not require the use of any 
> octets over 127 decimal (7F hex).  Accordingly, it is legal to to label 
> such a file as, eg, US-ASCII, ISO-8859-1, X-SJIS, or any other "charset" 
> which has ASCII as a subset.

It is not very harmful to label such pages ISO-8859-1 or whatever.
But strictly speaking, it is not legal! If there are alternatives
for labeling, the most restrictive label should be used. If it's
labeled us-ascii, you know that it's going to pass though 7-bit
mail. Otherwise, you don't.

I don't see that much of future popularity for purely NCR-coded
documents. These are more valuable for cases where you want to
add a character or two from a script not supported in the
local encoding used, e.g. a Kanji or two to a German document
or so.

Regards,	Martin.
Received on Wednesday, 5 February 1997 08:28:31 UTC

This archive was generated by hypermail 2.4.0 : Thursday, 2 February 2023 18:43:01 UTC