- From: Martin J. Duerst <mduerst@ifi.unizh.ch>
- Date: Tue, 10 Dec 1996 16:36:10 +0100 (MET)
- To: Klaus Weide <kweide@tezcat.com>
- cc: Keld J|rn Simonsen <keld@dkuug.dk>, www-international@w3.org, http-wg%cuckoo.hpl.hp.com@hplb.hpl.hp.com
On Mon, 9 Dec 1996, Klaus Weide wrote: > On Sun, 8 Dec 1996, Keld J&o/rn Simonsen wrote: > > Koen Holtman writes: > > > > > But skimming the UTF-8 specification, I gather that UTF-8 is an encoding > > > mechanism, not a character set. > > > > Well, no. UTF8 is an encoding of characters. It implies the character > ^^^^^^^^^^^^^^^^^^^^^^^^ > > repertoire of ISO 10646. So it is a charset in MIME sense, including > ^^^^^^^^^^^^^^^^^^^^^^^ > > the specific character definitions of 10646. > > If that is taken seriously, then "Accept-Charset: utf-8" cannot be used > to just send information about what character encoding a client can > decode. It implies that (at least when sent in the encoding of utf-8) > all characters from the 10646 repertoire are acceptable. Yes, all characters are acceptable up to the level of acceptability that the HTML I18N spec requires. Which is not very much. > It seems predictable that e.g. "Accept-Charset: koi8-r,iso-8859-1,utf-8" > will be used to indicate "documents containing characters which are > also in koi8-r and latin-1 characters are acceptable in utf-8 encoding", > because there is currently no better way to express that (other than > maybe with language tags, which has other problems already mentioned: > e.g. transliteration/transcription, languages that do not imply exactly > one character repertoire). There is no real need to express subrepertoiries. > This is of course not specific to HTTP or the Web, protocols without > negotiation like mail need charset labelling. A simple MIME compliant > MUA should have sufficient information from message headers to dispatch > to the appropriate viewer. In the pre-UTF era this was reliably possible > e.g. with metamail (given the correct charset parameter and availability of > appropriate codepage). With messages labelled "utf-8", heuristics have to > be involved. The concept of having a different viewer for every "charset" is still widespread, but rather outdated and doomed. For a network computer running Java, being able to display all of Unicode/ISO 10646 is just a must. It might be a pain for older and more expensive technology to follow, but that's their problem, not ours. Regards, Martin.
Received on Tuesday, 10 December 1996 10:36:38 UTC