Next message: Terje Bless: "Re: validators hang when referred from private address space"
Date: Fri, 31 Aug 2001 09:02:40 +0200
From: Terje Bless <link@pobox.com>
To: John Middleton <jmiddlet@sedl.org>
cc: www-validator@w3.org
Message-ID: <20010831101251-r01010800-60fc08e1-0910-010c@localhost>
Subject: Re: Charset "iso-10646-1"
On 24.08.01 at 15:46, John Middleton <jmiddlet@sedl.org> wrote:
>I found charset=iso-10646-1 on W3C website
Since I see Martin has already dealt with this, let me just add a few short
and hopefully clarifying points.
1. ISO-10646-1, aka. "Unicode" specifies a set of characters. It does
not specify how to encode them into bits and bytes in your document.
To actually use this character repertoire, you need to use one of
the specified encodings for it. Usually, this means UTF-8.
2. Character References in HTML documents (e.g. ․) _always_
refer to Unicode characters irrespective of what "charset" you've
given for the HTML page in question.
3. The "charset" parameter (in the HTTP Content-Type header or embedded
in a META element in your document) specifies what character encoding
was used to encode this particular document and does not affect how
numeric character references are interpreted (cf. #2 above).
4. Various browser's support for all this is spotty at best. To get this
to actually work in practice (as opposed to the theory above) you may
have to engage in works of sympatetic magic; electronic voodoo. :-)
In particular, it's possible that the browser in question will not
understand particular character references (numeric or named) when
printing unless the "charset" is set to an encoding that supports
it. This can only be determined by experimenting with various
workarounds to see which work the bugs in the browsers you happen to
care about.
One of "ISO-8859-1" or "UTF-8" should work, depending on the browsers
you are trying to support.