Re: Accept-Charset support

Klaus Weide writes:

> On Sat, 14 Dec 1996, Keld J=F8rn Simonsen wrote:
> > I have suggested that we introduce a repertoire identification
> > in IP protocols, to address that issue.=20
> Since this goes beyond HTTP or the www, I would be interested to know
> where (in which forum) you have made that proposal; and whether it is=20
> more likely to be considered there. :)

It was for the IAB character set report - the one recommending 10646
as the character set for all internet protocols.

> This may be more useful for other protocols (MIME for mail, news etc.)
> than HTTP and other formats than HTML, because HTML already implies
> that all 10646 characters can occur (according to the i18n draft;
> numeric character references to all 10646 characters are valid,
> independend from the character encoding used for transfer of a
> document; as has recently been pointed out).  But text/plain is also
> still part of the Web, IMHO...
> Having a way to externally identify character repertoire may lead to
> faster acceptance of UTF-8 as character encoding, since one can then
> use UTF-8 without losing the repertoire information implied by the
> currenlt used charsets.

I also saw Martin's remark here, and I am not so sure about it.
My main purpose for proposing a repertoire identifier/header
was from an architectural  point of view, to align with the
concept from ISO on character sets. Of cause it complicates matters
with yet another parameter, but it could help in chosing an
appropiate font, and then it is the right concept.

I note that a MIME charset identifies a repertoire, and you could then
use the mime charsets as also parameters here. From a practical
view there are only a limited set of repertoires, (I don't think
the N*N or maybe N! sets are feasible). Included scould also be
the subrepertoires identified in 10646. Repertoires are also a
key concept for transliteration, which are needed especially when
people want to make something out of a text, but does not understand
the script, eg cyrillic, or indic.


Received on Wednesday, 18 December 1996 00:13:17 UTC