Re: Accept-Charset support from Klaus Weide on 1996-12-17 (www-international@w3.org from October to December 1996)

From: Klaus Weide <kweide@tezcat.com>
Date: Mon, 16 Dec 1996 21:53:51 -0600 (CST)
To: Keld J|rn Simonsen <keld@dkuug.dk>
cc: http-wg@cuckoo.hpl.hp.com, www-international@w3.org
Message-ID: <Pine.SUN.3.95.961216210957.29478B-100000@huitzilo.tezcat.com>

On Sat, 14 Dec 1996, Keld Jørn Simonsen wrote:
> Klaus Weide writes:
> >[...] 
> > It seems predictable that e.g. "Accept-Charset: koi8-r,iso-8859-1,utf-8"
> > will be used to indicate "documents containing characters which are 
> > also in koi8-r and latin-1 characters are acceptable in utf-8 encoding", 
> > because there is currently no better way to express that (other than
> > maybe with language tags, which has other problems already mentioned:
> > e.g. transliteration/transcription, languages that do not imply exactly
> > one character repertoire).
> 
> I have suggested that we introduce a repertoire identification
> in IP protocols, to address that issue. 

Since this goes beyond HTTP or the www, I would be interested to know
where (in which forum) you have made that proposal; and whether it is 
more likely to be considered there. :)

This may be more useful for other protocols (MIME for mail, news etc.)
than HTTP and other formats than HTML, because HTML already implies
that all 10646 characters can occur (according to the i18n draft;
numeric character references to all 10646 characters are valid,
independend from the character encoding used for transfer of a
document; as has recently been pointed out).  But text/plain is also
still part of the Web, IMHO...

Having a way to externally identify character repertoire may lead to
faster acceptance of UTF-8 as character encoding, since one can then
use UTF-8 without losing the repertoire information implied by the
currenlt used charsets.

  Klaus

Received on Monday, 16 December 1996 22:54:04 UTC