RE: accented characters, etc.

From: François Yergeau <yergeau@alis.com>
Date: Thu, 02 Dec 1999 22:08:49 -0500
To: www-html@w3.org
Message-id: <00f801bf3d45$812f4620$7690dfa0@fyergeau2.intra.alis.com>
John Delacour wrote:
> Without going so far as using Unicode, you can also declare
> some other character set such as iso-8858-2 and use decimal
> or hexadecimal character entities, eg. &#131;  &x83;

This is wrong.  Numeric character references always refer to Unicode, never
to the character encoding of the document.  Since Unicode has no character
number 131, &#131; is meaningless (although it *may* work in some browsers).

Russell O'Connor wrote:
>To be more precice, you don't need to declare UTF-8 as your character
>encoding (and probably shouldn't), to use these entities.  No matter what
>your character encoding is, &#xxxx; will refer to the Unicode character
>number xxxx.

That's the theory.  Unfortunately, most current browsers will refuse to
display &#xxxx; if character xxxx is not part of the repertoire of the
character encoding of the document.  For instance, putting &#23665; in an
ISO Latin-1 document will not result in a Han character being displayed in
most browsers, whereas putting the same in a Shift-JIS document will likely
work _in_the_same_browser_ (if it has CJK support). Things are improving,

François Yergeau
Received on Thursday, 2 December 1999 23:32:51 UTC

