- From: Keld J|rn Simonsen <keld@dkuug.dk>
- Date: Sat, 14 Dec 1996 20:32:59 +0100
- To: Klaus Weide <kweide@tezcat.com>, http-wg%cuckoo.hpl.hp.com@hplb.hpl.hp.com
- Cc: www-international@w3.org, http-wg%cuckoo.hpl.hp.com@hplb.hpl.hp.com
Klaus Weide writes: > On Sun, 8 Dec 1996, Keld J&o/rn Simonsen wrote: > > Koen Holtman writes: > > > > > But skimming the UTF-8 specification, I gather that UTF-8 is an encoding > > > mechanism, not a character set. > > > > Well, no. UTF8 is an encoding of characters. It implies the character > ^^^^^^^^^^^^^^^^^^^^^^^^ > > repertoire of ISO 10646. So it is a charset in MIME sense, including > ^^^^^^^^^^^^^^^^^^^^^^^ > > the specific character definitions of 10646. > > If that is taken seriously, then "Accept-Charset: utf-8" cannot be used > to just send information about what character encoding a client can > decode. It implies that (at least when sent in the encoding of utf-8) > all characters from the 10646 repertoire are acceptable. > > It seems predictable that e.g. "Accept-Charset: koi8-r,iso-8859-1,utf-8" > will be used to indicate "documents containing characters which are > also in koi8-r and latin-1 characters are acceptable in utf-8 encoding", > because there is currently no better way to express that (other than > maybe with language tags, which has other problems already mentioned: > e.g. transliteration/transcription, languages that do not imply exactly > one character repertoire). I have suggested that we introduce a repertoire identification in IP protocols, to address that issue. keld
Received on Saturday, 14 December 1996 11:36:22 UTC