- From: Jukka Korpela <jkorpela@cc.hut.fi>
- Date: Fri, 17 Dec 1999 12:30:52 +0200 (EET)
- To: www-validator@w3.org
(For background of this message, you may wish to read a message I just sent to comp.infosystems.www.authoring.html Re: Character encoding Message-ID <3859ffbb.1280295207@news.cs.hut.fi>) It seems that if a server sends Content-Type: text/html without any charset parameter and the document contains a <meta http-equiv="Content-Type" content="..."> element, the validator reports a mismatch. Example: URI: http://www.fi/ Server: Apache/1.3.6 (Unix) PHP/3.0.11 Character encoding: iso-8859-1 The character encoding specified in the HTTP header ("") is different from the one specified in the META element ("iso-8859-1"). I will use "iso-8859-1" for this validation. This seems to cause confusion, and it's a pretty common situation (for strange reasons, but that's a different story). First I'd suggest making such messages more informative, without making them essentially longer (visibly): the string "Character encoding" could be made a link to http://www.w3.org/TR/REC-html40/charset.html#h-5.2.2 and instead of just "HTTP header" it could say "HTTP header Content-Type" and that could link to http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.17 (Or is that too technical? It would be better to link to a document for a wide audience explaining things in plain English and then pointing to the technical details.) Second, in this particular case it seems that the validator gives a bit misleading information, probably because the code for checking the presence of charset attribute does not work. The relevant portion (in v. 1.56) seems to be: if ($File->{HTTP_Charset} ne $File->{META_Charset} and $File->{META_Charset} ne '' and $File->{Charset} ne 'unknown') { (followed by code for printing the message). And this doesn't quite work, since the test does not exclude the case where HTTP_Charset is empty due to total lack of charset attribute in the header. There are of course two options for handling this case: either suppress the message (I guess that's the current _intent_) or, IMHO better, issue a different message, e.g. Character encoding: iso-8859-1. This is based on the information in a META element, since no encoding was specified in the HTTP header Content-Type: text/html. -- Yucca, http://www.hut.fi/u/jkorpela/ or http://yucca.hut.fi/yucca.html
Received on Friday, 17 December 1999 05:30:56 UTC