- From: Terje Bless <link@pobox.com>
- Date: Fri, 31 Aug 2001 09:02:40 +0200
- To: John Middleton <jmiddlet@sedl.org>
- cc: www-validator@w3.org
On 24.08.01 at 15:46, John Middleton <jmiddlet@sedl.org> wrote: >I found charset=iso-10646-1 on W3C website Since I see Martin has already dealt with this, let me just add a few short and hopefully clarifying points. 1. ISO-10646-1, aka. "Unicode" specifies a set of characters. It does not specify how to encode them into bits and bytes in your document. To actually use this character repertoire, you need to use one of the specified encodings for it. Usually, this means UTF-8. 2. Character References in HTML documents (e.g. ․) _always_ refer to Unicode characters irrespective of what "charset" you've given for the HTML page in question. 3. The "charset" parameter (in the HTTP Content-Type header or embedded in a META element in your document) specifies what character encoding was used to encode this particular document and does not affect how numeric character references are interpreted (cf. #2 above). 4. Various browser's support for all this is spotty at best. To get this to actually work in practice (as opposed to the theory above) you may have to engage in works of sympatetic magic; electronic voodoo. :-) In particular, it's possible that the browser in question will not understand particular character references (numeric or named) when printing unless the "charset" is set to an encoding that supports it. This can only be determined by experimenting with various workarounds to see which work the bugs in the browsers you happen to care about. One of "ISO-8859-1" or "UTF-8" should work, depending on the browsers you are trying to support.
Received on Friday, 31 August 2001 04:15:25 UTC