Re: Accept-Charset support from Martin J. Duerst on 1996-12-18 (ietf-http-wg@w3.org from October to December 1996)

From: Martin J. Duerst <mduerst@ifi.unizh.ch>
Date: Wed, 18 Dec 1996 17:36:26 +0100 (MET)
To: Keld J|rn Simonsen <keld@dkuug.dk>
Cc: Klaus Weide <kweide@tezcat.com>, http-wg%cuckoo.hpl.hp.com@hplb.hpl.hp.com, www-international@w3.org, http-wg%cuckoo.hpl.hp.com@hplb.hpl.hp.com
Message-Id: <Pine.SUN.3.95.961218172404.245H-100000@enoshima>

On Wed, 18 Dec 1996, Keld J|rn Simonsen wrote:

> I also saw Martin's remark here, and I am not so sure about it.
> My main purpose for proposing a repertoire identifier/header
> was from an architectural  point of view, to align with the
> concept from ISO on character sets. Of cause it complicates matters
> with yet another parameter, but it could help in chosing an
> appropiate font, and then it is the right concept.

Choosing the right font (or in many cases the right fontS)
is a matter that can be highly complex. For some clues to
what issues are involved, please see
	ftp://ftp.ifi.unizh.ch/pub/multilingual/FontComposition.ps.Z
A repertoire is not sufficient for these cases, while having
a look at the document itself will help a lot (and will also
help in the simple cases, as has been pointed out before).

> I note that a MIME charset identifies a repertoire, and you could then
> use the mime charsets as also parameters here. From a practical
> view there are only a limited set of repertoires, (I don't think
> the N*N or maybe N! sets are feasible).

I agree that it's not the worst case of 2**N (much smaller than N!,
but bigger than N*N). 

>Included scould also be
> the subrepertoires identified in 10646.

How many are there? And how many "charset"s do we have? My guess
is that altogether, there might be around 500. Because repertories
can be combined, you end up with 2**500, which is around 1E150.
As you can see, we get into pretty large numbers rather quickly.
We can start to try to separate useful and less usefull combinations,
but that will be more work than assuring that all of ISO 10646 can
be displayed somehow.

> Repertoires are also a
> key concept for transliteration, which are needed especially when
> people want to make something out of a text, but does not understand
> the script, eg cyrillic, or indic.

If you move transliteration to the client side, repertories are
much less of an issue. If you are going for "graphic transliteration",
such as replacing u-umlaut by u", they are not needed. For language-
dependent transliteration (using ue in the above case), language
information is needed, but again no repertoiry.

Regards,	Martin.

Received on Wednesday, 18 December 1996 08:38:18 UTC