Re: Accept-Charset support from Martin J. Duerst on 1996-12-17 (www-international@w3.org from October to December 1996)

From: Martin J. Duerst <mduerst@ifi.unizh.ch>
Date: Tue, 17 Dec 1996 10:48:14 +0100 (MET)
To: Klaus Weide <kweide@tezcat.com>
cc: Keld J|rn Simonsen <keld@dkuug.dk>, http-wg@cuckoo.hpl.hp.com, www-international@w3.org
Message-ID: <Pine.SUN.3.95.961217103202.245B-100000@enoshima>

On Mon, 16 Dec 1996, Klaus Weide wrote:

> On Sat, 14 Dec 1996, Keld Jørn Simonsen wrote:
> > I have suggested that we introduce a repertoire identification
> > in IP protocols, to address that issue. 
> 
> Since this goes beyond HTTP or the www, I would be interested to know
> where (in which forum) you have made that proposal; and whether it is 
> more likely to be considered there. :)
> 
> This may be more useful for other protocols (MIME for mail, news etc.)
> than HTTP and other formats than HTML, because HTML already implies
> that all 10646 characters can occur (according to the i18n draft;
> numeric character references to all 10646 characters are valid,
> independend from the character encoding used for transfer of a
> document; as has recently been pointed out).  But text/plain is also
> still part of the Web, IMHO...
> 
> Having a way to externally identify character repertoire may lead to
> faster acceptance of UTF-8 as character encoding, since one can then
> use UTF-8 without losing the repertoire information implied by the
> currenlt used charsets.

I don't think so. Adding repertoire information will complicate
things, and slow down use of UTF-8. It also detracts from the basic
idea of Unicode/ISO 10646, which is to remove repertoire restrictions.

To have the client care about how to represent all the characters
in ISO10646 is much easier than to have the server care about
how to present a document in an arbitrary repertiore. With N
characters in total, on the client you only need a list of length
N, giving the representation of each of these characters. In many
implementations, this list will be trivial for many parts of the
list. There are also many very creative solutions available, such
as making each "undisplayable" character a little box with a link
to a page describing that character.
If you want to deal with things on the server, you need in principle
2**N different solutions, each for all N characters. In practice
it's still a lot of different solutions you have to care for.

So we really don't need protocol extensions for something that
nobody will implement because it's easier to deal with on the
client side. Those few cases where it's really relevant,
such as Japanese transliterated to romaji, can be dealt with
the existing mechanisms.

Regards,	Martin.

Received on Tuesday, 17 December 1996 04:57:31 UTC