Re: Accept-Charset support from Keld J|rn Simonsen on 1996-12-14 (ietf-http-wg@w3.org from October to December 1996)

From: Keld J|rn Simonsen <keld@dkuug.dk>
Date: Sat, 14 Dec 1996 20:32:59 +0100
To: Klaus Weide <kweide@tezcat.com>, http-wg%cuckoo.hpl.hp.com@hplb.hpl.hp.com
Cc: www-international@w3.org, http-wg%cuckoo.hpl.hp.com@hplb.hpl.hp.com
Message-Id: <199612141933.UAA28681@dkuug.dk>

Klaus Weide writes:

> On Sun, 8 Dec 1996, Keld J&o/rn Simonsen wrote:
> > Koen Holtman writes:
> > 
> > > But skimming the UTF-8 specification, I gather that UTF-8 is an encoding
> > > mechanism, not a character set.
> > 
> > Well, no. UTF8 is an encoding of characters. It implies the character
>                                                ^^^^^^^^^^^^^^^^^^^^^^^^
> > repertoire of ISO 10646. So it is a charset in MIME sense, including
>   ^^^^^^^^^^^^^^^^^^^^^^^
> > the specific character definitions of 10646.
> 
> If that is taken seriously, then "Accept-Charset: utf-8" cannot be used
> to just send information about what character encoding a client can
> decode.  It implies that (at least when sent in the encoding of utf-8)
> all characters from the 10646 repertoire are acceptable.
> 
> It seems predictable that e.g. "Accept-Charset: koi8-r,iso-8859-1,utf-8"
> will be used to indicate "documents containing characters which are 
> also in koi8-r and latin-1 characters are acceptable in utf-8 encoding", 
> because there is currently no better way to express that (other than
> maybe with language tags, which has other problems already mentioned:
> e.g. transliteration/transcription, languages that do not imply exactly
> one character repertoire).

I have suggested that we introduce a repertoire identification
in IP protocols, to address that issue. 

keld

Received on Saturday, 14 December 1996 11:36:22 UTC