Re: Accept-Charset support from Martin J. Duerst on 1996-12-10 (www-international@w3.org from October to December 1996)

From: Martin J. Duerst <mduerst@ifi.unizh.ch>
Date: Tue, 10 Dec 1996 16:51:26 +0100 (MET)
To: Klaus Weide <kweide@tezcat.com>
cc: Larry Masinter <masinter@parc.xerox.com>, http-wg@cuckoo.hpl.hp.com, www-international@w3.org
Message-ID: <Pine.SUN.3.95.961210164302.245K-100000@enoshima>

On Sat, 7 Dec 1996, Klaus Weide wrote:

> Saying "I can understand 10646" or "I can understand UTF-8" practically
> just means that I can decode that character encoding.  That is on the
> same level as saying "I can understand 8-bit character sets" without
> specifying which.  If anything more detailed is too fine-grained to
> really matter then I don't see why anybody should currently bother 
> to use Accept-Charset: ISO-8859-2 etc.

Oh no, definitely not. Saying you understand UTF-8, or anything else,
means that you can decode and you know which characters you can
render and which not. Just taking arbitrary glyphs for arbitrary
characters, as you would when e.g. renderig EBCDIC with a
ISO-8859-2 font, is absolutely forbidden.
The HTML I18N spec does not exactly specify what you have to
do with charcters you cannot render, but it is clear enough
to say that this should be done in a way that allows the
reader to distinguish between correctly rendered characters
and characters that couldn't be rendered.

> With the given structure of the MIME "charset" parameter (and therefore
> the Accept-Charset header), the logical thing to at least preserve
> what currently can be expressed w.r.t. repertoire would be to register
> lots of additional charsets: we'd then have ISO-10646-Unicode-Latin2,
> ISO-10646-Unicode-Latin3, ISO-10646-Unicode-Latin4, and so on.  Well
> I can see why that isn't very inviting, looks like a big can of worms...    

It definitely is! Look at RFC 1815 (ignored by everybody) to
see how this can be brought to extremes.

Regards,	Martin.

Received on Tuesday, 10 December 1996 10:52:56 UTC