- From: David Woolley <david@djwhome.demon.co.uk>
- Date: Fri, 2 Jun 2006 08:19:51 +0100 (BST)
- To: www-html@w3.org
> I suspect that he issue has less to do with publishing multilingual HTML > documents on the web in UTF-8 than the infrastructure that is being used > to achieve the task. I am aware of many companies that publish Yes. My eventual conclusion was that it was the web server configuration. It may well be that that was well intentioned, and done to make it slightly more likely that documents would be served with a defined character set in the primary market area (given that many people hand coding will never include the charset parameter). However, if that was the case, they have failed to consider internationalisation issues, and they have failed to consider the large number of documents authored with Microsoft tools that use windows-1252 characters coded as such, or older documents coded with windows-1252, but lacking a charset specification. (An explicit character set ought to turn off auto-detection, although one could argue that the presence of illegal characters (0x80 to 0x9F) could trigger a heuristic to ignore the character set.) Incidentally, one of the main reasons why servers don't honour meta http-equiv elements is that it represents a layering violation. A server is about serving resources of all sorts and shouldn't need to have internal knowledge of particular document languages. This is even more true of caching proxies, and why it is pretty pointless to try and control caching behaviour with meta http-equiv. Unfortunately there are commercial reasons why low cost web space doesn't provide the ability to use that space properly, by configuring meta data, and psychological reasons why people won't learn HTTP as well as HTML, with the result that, instead of doing things properly, people find workarounds. The commercial reasons are partially to lower the security risk to the server, and partially to encourage the purchase of premium services. Unfortunately, rather than encouraging upgrades it results in workarounds. > multilingual sites in UTF-8 that work fine with IE. However, it is still a bad idea to use Appendix C XHTML unless you also intend to serve it with a proper XHTML media type when talking to compatible browsers AND it uses name space mixing on those browsers. Also, if one does so, one should always specify the character set at both XML and XHTML levels. There's much more about the Appendix C mode issues in the thread from February entitled "Question about XHTML 2.0 and content type". > document...just in case the user never set that in the page. The > autodetection has worked well for a number of years. It was broken for a number of years (and may still be). If you selected it the body of printed pages was always blank - you just got page headers. This was, I seem to remember, acknowledged in the knowledge base.
Received on Friday, 2 June 2006 07:20:07 UTC