sub-repertoires (was: Accept-Charset support) from Klaus Weide on 1996-12-17 (www-international@w3.org from October to December 1996)

From: Klaus Weide <kweide@tezcat.com>
Date: Tue, 17 Dec 1996 13:03:16 -0600 (CST)
To: "Martin J. Duerst" <mduerst@ifi.unizh.ch>
cc: www-international@w3.org, http-wg@cuckoo.hpl.hp.com
Message-ID: <Pine.SUN.3.95.961217123422.29478L-100000@huitzilo.tezcat.com>

On Tue, 17 Dec 1996, Martin J. Duerst wrote:
> On Mon, 16 Dec 1996, Klaus Weide wrote:
> 
> > Currently accept-charset is de facto used as an expression of _two_
> > capabilities: (1) to decode a character encoding, (2) to be able to
> > display (or take responsibility for) a certain character repertoire.
> 
> This is just because current clients don't actually do much of a
> decoding, they just switch to a font they have available.

1) I don't think that is tru for all current clients,
2) the effect is the same.

[...]
> > Example of a site where documents are provided in several charsets
> > (all for the same language):
> > see <URL: http://www.fee.vutbr.cz/htbin/codepage>.
> 
> The list is impressive. It becomes less impressive if you realize
> that all (as far as I have checked) the English pages and some
> of the Check pages (MS Cyrillic/MS Greek/MS Hebrew,...) are just
> plain ASCII, and don't need a separate URL nor should be labeled
> as such in the HTTP header. They could add a long list of other
> encodings, and duplicate their documents, [...]

Right.  There are still at least 3 different (in a relevant way)
repertoires.

If those could be labelled (and negotiated) as charset=UTF-8 + 
repertoire indication,  less duplication would be needed.

> > It is certainly much easier to make a Web clients able to decode UTF-8
> > to locally available character sets, than to upgrade all client
> > machines so that they have fonts available to display all of the 10646
> > characters.
> 
> The big problem is not fonts. A single font covering all current ISO
> 10646 characters can easily be bundeled with a browser. The main
> problem is display logic, for Arabic and Indic languages in particular.

I am interested in having something that could work well with UTF-8
even on a vtxxx terminal or a Linux console (in cases where that makes
sense).

> Definitely UTF-8 should be encouraged. But that's not done by
> introducing new protocol complications and requiring the servers
> to deal with unpredictable transliteration issues that can be
> dealt with more easily on the client side.

I am not thinking of requiring a server to do anything.  Just being
able to say 
 Content-type: text/plain;charset=utf-8; charrep="latin-1,latin-2,koi8"
(made-up syntax, may be fatally flawed) for those who wish to do so;
and something equivalent for the "accept-*" side.  Nothing mandatory, let
everybody who doesn't care default to the currently implied full 10646 
repertoire.  I think the examples show that people are doing the
equivalent now (whether accidentally or not).

No client would be forced to do anything with that additional info (they
can ignore it, or treat as advisory).  No server would be required to 
send it, or react to a "accept-charrep/accept-features:..." (or whatever
the syntax might be).

  Klaus

Received on Tuesday, 17 December 1996 14:03:32 UTC