- From: David Woolley <david@djwhome.demon.co.uk>
- Date: Tue, 16 Jul 2002 22:44:47 +0100 (BST)
- To: w3c-wai-ig@w3.org
> Any tool that allows us to convert big5 formatted text to UTF-8 text? Effectively any modern browsers does this (except probably UTF-16, rather than UTF-8). The main thing to remember is to do what the standards have required since HTTP 1.0, but is very often forgotten - identify the character set with the page. For all browsers, you can do this using the charset parameter in the real HTTP Content-Type header. The default for text/html is iso-8859-1, however, current best practice, enforced by the W3C's validator, is never to let it default. You can use this for any text/ format. For post HTML 4.0 browser, you can also use meta elements include a copy of that header; the real HTTP header takes precedence, if it specifies a character set. It has become common practice to treat no character set as meaning the character set of the country in which the page was authored, but this is wrong; it results in Japanese displaying a gibberish European accented characters, or the browser having to do character frequency based heuristics to guess what was really meant. It is possible that some very old browsers react inappropriately to this. These browsers were probably never intended for use outside the US market, but may have been adapted by bolt on software that re-interprets the characters as CJK ones.
Received on Tuesday, 16 July 2002 17:48:49 UTC