Re: Charset "iso-10646-1"

From: Terje Bless (link@pobox.com)
Date: Fri, Aug 31 2001

  • Next message: Terje Bless: "Re: validators hang when referred from private address space"

    Date: Fri, 31 Aug 2001 09:02:40 +0200
    From: Terje Bless <link@pobox.com>
    To: John Middleton <jmiddlet@sedl.org>
    cc: www-validator@w3.org
    Message-ID: <20010831101251-r01010800-60fc08e1-0910-010c@localhost>
    Subject: Re: Charset "iso-10646-1"
    
    On 24.08.01 at 15:46, John Middleton <jmiddlet@sedl.org> wrote:
    
    >I found charset=iso-10646-1  on W3C website 
    
    Since I see Martin has already dealt with this, let me just add a few short
    and hopefully clarifying points.
    
    1. ISO-10646-1, aka. "Unicode" specifies a set of characters. It does
       not specify how to encode them into bits and bytes in your document.
       To actually use this character repertoire, you need to use one of
       the specified encodings for it. Usually, this means UTF-8.
    
    2. Character References in HTML documents (e.g. &#8228;) _always_
       refer to Unicode characters irrespective of what "charset" you've
       given for the HTML page in question.
    
    3. The "charset" parameter (in the HTTP Content-Type header or embedded
       in a META element in your document) specifies what character encoding
       was used to encode this particular document and does not affect how
       numeric character references are interpreted (cf. #2 above).
    
    4. Various browser's support for all this is spotty at best. To get this
       to actually work in practice (as opposed to the theory above) you may
       have to engage in works of sympatetic magic; electronic voodoo. :-)
    
       In particular, it's possible that the browser in question will not
       understand particular character references (numeric or named) when
       printing unless the "charset" is set to an encoding that supports
       it. This can only be determined by experimenting with various
       workarounds to see which work the bugs in the browsers you happen to
       care about.
    
       One of "ISO-8859-1" or "UTF-8" should work, depending on the browsers
       you are trying to support.