[Prev][Next][Index][Thread]
Re: Accept-Charset support
On Mon, 9 Dec 1996, Klaus Weide wrote:
> On Sun, 8 Dec 1996, Keld J&o/rn Simonsen wrote:
> > Koen Holtman writes:
> >
> > > But skimming the UTF-8 specification, I gather that UTF-8 is an encoding
> > > mechanism, not a character set.
> >
> > Well, no. UTF8 is an encoding of characters. It implies the character
> ^^^^^^^^^^^^^^^^^^^^^^^^
> > repertoire of ISO 10646. So it is a charset in MIME sense, including
> ^^^^^^^^^^^^^^^^^^^^^^^
> > the specific character definitions of 10646.
>
> If that is taken seriously, then "Accept-Charset: utf-8" cannot be used
> to just send information about what character encoding a client can
> decode. It implies that (at least when sent in the encoding of utf-8)
> all characters from the 10646 repertoire are acceptable.
Yes, all characters are acceptable up to the level of acceptability
that the HTML I18N spec requires. Which is not very much.
> It seems predictable that e.g. "Accept-Charset: koi8-r,iso-8859-1,utf-8"
> will be used to indicate "documents containing characters which are
> also in koi8-r and latin-1 characters are acceptable in utf-8 encoding",
> because there is currently no better way to express that (other than
> maybe with language tags, which has other problems already mentioned:
> e.g. transliteration/transcription, languages that do not imply exactly
> one character repertoire).
There is no real need to express subrepertoiries.
> This is of course not specific to HTTP or the Web, protocols without
> negotiation like mail need charset labelling. A simple MIME compliant
> MUA should have sufficient information from message headers to dispatch
> to the appropriate viewer. In the pre-UTF era this was reliably possible
> e.g. with metamail (given the correct charset parameter and availability of
> appropriate codepage). With messages labelled "utf-8", heuristics have to
> be involved.
The concept of having a different viewer for every "charset" is still
widespread, but rather outdated and doomed. For a network computer
running Java, being able to display all of Unicode/ISO 10646 is
just a must. It might be a pain for older and more expensive
technology to follow, but that's their problem, not ours.
Regards, Martin.
References: