Re: Accept-Charset support from Martin J. Duerst on 1996-12-17 (www-international@w3.org from October to December 1996)

From: Martin J. Duerst <mduerst@ifi.unizh.ch>
Date: Tue, 17 Dec 1996 11:09:40 +0100 (MET)
To: Klaus Weide <kweide@tezcat.com>
cc: Larry Masinter <masinter@parc.xerox.com>, www-international@w3.org, http-wg@cuckoo.hpl.hp.com
Message-ID: <Pine.SUN.3.95.961217105311.245D-100000@enoshima>

On Mon, 16 Dec 1996, Klaus Weide wrote:

> Currently accept-charset is de facto used as an expression of _two_
> capabilities: (1) to decode a character encoding, (2) to be able to
> display (or take responsibility for) a certain character repertoire.

This is just because current clients don't actually do much of a
decoding, they just switch to a font they have available.

> The second aspect is not new, is has been around as long as MIME had
> a charset parameter.  It will be lost when/if everything moves to UTF-8
> (and starts using accept-charset: utf-8 / charset=utf-8).
> 
> Example of a site where documents are provided in several charsets
> (all for the same language):
> see <URL: http://www.fee.vutbr.cz/htbin/codepage>.

The list is impressive. It becomes less impressive if you realize
that all (as far as I have checked) the English pages and some
of the Check pages (MS Cyrillic/MS Greek/MS Hebrew,...) are just
plain ASCII, and don't need a separate URL nor should be labeled
as such in the HTTP header. They could add a long list of other
encodings, and duplicate their documents, to e.g. serve them in
Shift-JIS or so :-). Only the US-ASCII version contains the comment
"bez diakritiky" (without diactritics), but that applies to many
more pages.

> It is certainly much easier to make a Web clients able to decode UTF-8
> to locally available character sets, than to upgrade all client
> machines so that they have fonts available to display all of the 10646
> characters.

The big problem is not fonts. A single font covering all current ISO
10646 characters can easily be bundeled with a browser. The main
problem is display logic, for Arabic and Indic languages in particular.

>I assume the former will be done much sooner, and that
> use of UTF-8 should be encouraged before all the fonts (or knowledge
> to choose culturally correct replacement representations) are
> available to everyone.

Definitely UTF-8 should be encouraged. But that's not done by
introducing new protocol complications and requiring the servers
to deal with unpredictable transliteration issues that can be
dealt with more easily on the client side.

Regards,	Martin.

Received on Tuesday, 17 December 1996 05:13:57 UTC