Re: non-sgml characters

> Does this mean that I only have to change the charset element on my meta
> tag to "UTF-8", with the *same* big5 text, and the browser will
> automatically show the correct characters? If this is what you meant,

No.  You must provide a charset HTTP attribute, that describes the
character set in which the document was sent, in the content attribute
of the meta element, or in the real HTTP header and the browser will
then convert the big5 into UCS internally, and display that.  The
browser should already be interpreting HTML numeric entities in Unicode,
so should not change in that respect.

If the browser displays Chinese characters when you include something
like £&#a3;, rather than two UK currency symbols, its broken - it
probably means you are using a "Chinese environment" rather than the 
browser's own international character support.  Such hacks are not
necessary on NN6, IE4+, NN4.7+ (and maybe earlier NN4.x's, but not
necessarily all), and correctly congfigured versions of Lynx, where
the terminal supports the characters.

It seems that you are already providing the correct charset.  This is quite
rare on CJK pages from academic sources and personal pages, although it
is becoming quite common on Chinese news organisation pages.  (Even
worse than no character set, but which would appear to work with 
a Chinese environment, and with a suitably misconfigured standard browser,
is using western Front Page's default of windows-1252.)

To the extent that Chinese environments are still in use (for display - 
they tend to have richer input methods than the OS) and people 
don't specify the true character set, it is the result of workarounds
for early commercial expediency, in going to market with US only products,
which have outlived their usefulness, but still get in the way of proper
use of the standards.

Received on Wednesday, 17 July 2002 02:21:42 UTC