Re: sub-repertoires (was: Accept-Charset support) from Martin J. Duerst on 1996-12-18 (www-international@w3.org from October to December 1996)

From: Martin J. Duerst <mduerst@ifi.unizh.ch>
Date: Wed, 18 Dec 1996 17:48:48 +0100 (MET)
To: Klaus Weide <kweide@tezcat.com>
cc: www-international@w3.org, http-wg@cuckoo.hpl.hp.com
Message-ID: <Pine.SUN.3.95.961218174236.245J-100000@enoshima>

On Tue, 17 Dec 1996, Klaus Weide wrote:

> On Tue, 17 Dec 1996, Martin J. Duerst wrote:
> > On Mon, 16 Dec 1996, Klaus Weide wrote:
> > 
> > > Currently accept-charset is de facto used as an expression of _two_
> > > capabilities: (1) to decode a character encoding, (2) to be able to
> > > display (or take responsibility for) a certain character repertoire.
> > 
> > This is just because current clients don't actually do much of a
> > decoding, they just switch to a font they have available.
> 
> 1) I don't think that is tru for all current clients,
> 2) the effect is the same.
> 
> [...]
> > > Example of a site where documents are provided in several charsets
> > > (all for the same language):
> > > see <URL: http://www.fee.vutbr.cz/htbin/codepage>.
> > 
> > The list is impressive. It becomes less impressive if you realize
> > that all (as far as I have checked) the English pages and some
> > of the Check pages (MS Cyrillic/MS Greek/MS Hebrew,...) are just
> > plain ASCII, and don't need a separate URL nor should be labeled
> > as such in the HTTP header. They could add a long list of other
> > encodings, and duplicate their documents, [...]
> 
> Right.  There are still at least 3 different (in a relevant way)
> repertoires.
> 
> If those could be labelled (and negotiated) as charset=UTF-8 + 
> repertoire indication,  less duplication would be needed.
> 
> > > It is certainly much easier to make a Web clients able to decode UTF-8
> > > to locally available character sets, than to upgrade all client
> > > machines so that they have fonts available to display all of the 10646
> > > characters.
> > 
> > The big problem is not fonts. A single font covering all current ISO
> > 10646 characters can easily be bundeled with a browser. The main
> > problem is display logic, for Arabic and Indic languages in particular.
> 
> I am interested in having something that could work well with UTF-8
> even on a vtxxx terminal or a Linux console (in cases where that makes
> sense).

If such a beast has the abilities to:

- Display a large set of glyphs
- Not assume one character == one display cell
- Have the possibility to insert some code between a charater string
	and a glyph sequence for appropriate conversion

then it is mainly a matter of time and effort.


> > Definitely UTF-8 should be encouraged. But that's not done by
> > introducing new protocol complications and requiring the servers
> > to deal with unpredictable transliteration issues that can be
> > dealt with more easily on the client side.
> 
> I am not thinking of requiring a server to do anything.  Just being
> able to say 
>  Content-type: text/plain;charset=utf-8; charrep="latin-1,latin-2,koi8"
> (made-up syntax, may be fatally flawed) for those who wish to do so;
> and something equivalent for the "accept-*" side.  Nothing mandatory, let
> everybody who doesn't care default to the currently implied full 10646 
> repertoire.  I think the examples show that people are doing the
> equivalent now (whether accidentally or not).

I showed that simple transliteration is much better handled on the
client than on the server. For accept-*, your main justification
seems transliteration. I think it is futile to invest in protocol
mechanisms that will never take on because no server will be ready
to do something that the client can do much easier.
As for charrep itself, the document says it all. No need for anything
else.

Regards,	Martin.

Received on Wednesday, 18 December 1996 11:49:28 UTC