Re: Charset "iso-10646-1" from Terje Bless on 2001-08-31 (www-validator@w3.org from August 2001)

From: Terje Bless <link@pobox.com>
Date: Fri, 31 Aug 2001 09:02:40 +0200
To: John Middleton <jmiddlet@sedl.org>
cc: www-validator@w3.org
Message-ID: <20010831101251-r01010800-60fc08e1-0910-010c@localhost>

On 24.08.01 at 15:46, John Middleton <jmiddlet@sedl.org> wrote:

>I found charset=iso-10646-1  on W3C website 

Since I see Martin has already dealt with this, let me just add a few short
and hopefully clarifying points.

1. ISO-10646-1, aka. "Unicode" specifies a set of characters. It does
   not specify how to encode them into bits and bytes in your document.
   To actually use this character repertoire, you need to use one of
   the specified encodings for it. Usually, this means UTF-8.

2. Character References in HTML documents (e.g. &#8228;) _always_
   refer to Unicode characters irrespective of what "charset" you've
   given for the HTML page in question.

3. The "charset" parameter (in the HTTP Content-Type header or embedded
   in a META element in your document) specifies what character encoding
   was used to encode this particular document and does not affect how
   numeric character references are interpreted (cf. #2 above).

4. Various browser's support for all this is spotty at best. To get this
   to actually work in practice (as opposed to the theory above) you may
   have to engage in works of sympatetic magic; electronic voodoo. :-)

   In particular, it's possible that the browser in question will not
   understand particular character references (numeric or named) when
   printing unless the "charset" is set to an encoding that supports
   it. This can only be determined by experimenting with various
   workarounds to see which work the bugs in the browsers you happen to
   care about.

   One of "ISO-8859-1" or "UTF-8" should work, depending on the browsers
   you are trying to support.

Received on Friday, 31 August 2001 04:15:25 UTC