- From: David Woolley <david@djwhome.demon.co.uk>
- Date: Thu, 1 Jun 2006 22:37:57 +0100 (BST)
- To: www-html@w3.org
> > http://thread.gmane.org/gmane.user-groups.linux.delhi/12845/focus=12845 The HTML 4.01 specification overrides the HTTP specification and says that the character set is undefined, not ISO 8859/1, when none is specified by other means. It also allows browsers to use heuristics, so one heurstic might be to assume ISO 8859/1! In practice, servers don't honour meta elements. However, browsers are required to do so, for this special case, if there is no charset in the real HTTP headers, so one gets the same result. Consequently, if this really were HTML, there would be no problem - in fact many high profile sites use UTF-8 with only meta elements to specify it. However, in this case, you aren't using HTML, but XHTML. In my view, it is almost certain that you are doing so for unsound reasons, but there are rules for the character set in XML and in fact the default is already UTF-8! However, it is likely that you are actually serving to Internet Explorer, which doesn't support XHTML, so you've had to serve it with headers that say that it is HTML. In fact, your meta element also says that it is HTML. You therefore have a confused situation where you are relying on browser error recovery to treat a document written in XHTML as though it were broken HTML. I'd suggest the first thing to do is to convert to XHTML 4.01 to eliminate the error recovery aspects. You would get a problem if the server had the character set explicitly set, but that is extremely rare even when in regions that require a non-default setting and where authors normally fail to use the meta route, relying on users to have their browser set to assume the local character set. However, in this case, that is exactly your problem! If you want to use this server, and you cannot convince them to remove the charset from the headers, you will need use entities to encode the non-ISO 8859/1 characters. Some authoring tools, such as Mozilla, will allow you to save in different character sets and will automatically do the required entity encoding. > Sorry for any inconvenience, but I think I've found a bug in HTML > specification (which might be prevalent in XHTML specifications also). > Not necessarily a bug, but a correction that needs to be done in the > HTML specification. You've failed to specify what you think the problem is, so I've had to try and analyze from the thread you referenced. HTTP/1.0 200 OK Age: 103 Date: Thu, 01 Jun 2006 21:27:47 GMT Content-Length: 1490 Content-Type: text/html; charset=iso-8859-1 Server: Apache/1.3.34 (Debian) PHP/4.4.2-1+b1 mod_choke/0.06 X-mod-choke: 0.06 Last-Modified: Wed, 26 Apr 2006 22:56:06 GMT ETag: "a808c-5d2-444ffa86"
Received on Thursday, 1 June 2006 21:38:09 UTC