RE: accented characters, etc.

John Delacour wrote:
>
> Without going so far as using Unicode, you can also declare
> some other character set such as iso-8858-2 and use decimal
> or hexadecimal character entities, eg. ƒ  &x83;

This is wrong.  Numeric character references always refer to Unicode, never
to the character encoding of the document.  Since Unicode has no character
number 131, ƒ is meaningless (although it *may* work in some browsers).


Russell O'Connor wrote:
>To be more precice, you don't need to declare UTF-8 as your character
>encoding (and probably shouldn't), to use these entities.  No matter what
>your character encoding is, &#xxxx; will refer to the Unicode character
>number xxxx.

That's the theory.  Unfortunately, most current browsers will refuse to
display &#xxxx; if character xxxx is not part of the repertoire of the
character encoding of the document.  For instance, putting 山 in an
ISO Latin-1 document will not result in a Han character being displayed in
most browsers, whereas putting the same in a Shift-JIS document will likely
work _in_the_same_browser_ (if it has CJK support). Things are improving,
though.

--
François Yergeau

Received on Thursday, 2 December 1999 23:32:51 UTC