- From: Jukka Korpela <jkorpela@cc.hut.fi>
- Date: Fri, 17 Dec 1999 12:30:52 +0200 (EET)
- To: www-validator@w3.org
(For background of this message, you may wish to read a message
I just sent to comp.infosystems.www.authoring.html Re: Character encoding
Message-ID <3859ffbb.1280295207@news.cs.hut.fi>)
It seems that if a server sends
Content-Type: text/html
without any charset parameter and the document contains a
<meta http-equiv="Content-Type" content="...">
element, the validator reports a mismatch. Example:
URI: http://www.fi/
Server: Apache/1.3.6 (Unix) PHP/3.0.11
Character encoding: iso-8859-1 The character encoding specified in the
HTTP header ("") is different from the one specified in the META element
("iso-8859-1"). I will use "iso-8859-1" for this validation.
This seems to cause confusion, and it's a pretty common situation (for
strange reasons, but that's a different story).
First I'd suggest making such messages more informative, without making
them essentially longer (visibly): the string "Character encoding"
could be made a link to
http://www.w3.org/TR/REC-html40/charset.html#h-5.2.2
and instead of just "HTTP header" it could say "HTTP header
Content-Type" and that could link to
http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.17
(Or is that too technical? It would be better to link to a document
for a wide audience explaining things in plain English and then pointing
to the technical details.)
Second, in this particular case it seems that the validator gives
a bit misleading information, probably because the code for checking
the presence of charset attribute does not work. The relevant portion
(in v. 1.56) seems to be:
if ($File->{HTTP_Charset} ne $File->{META_Charset}
and $File->{META_Charset} ne ''
and $File->{Charset} ne 'unknown') {
(followed by code for printing the message). And this doesn't quite
work, since the test does not exclude the case where HTTP_Charset is
empty due to total lack of charset attribute in the header.
There are of course two options for handling this case: either suppress
the message (I guess that's the current _intent_) or, IMHO better,
issue a different message, e.g.
Character encoding: iso-8859-1. This is based on the information in a
META element, since no encoding was specified in the HTTP header
Content-Type: text/html.
--
Yucca, http://www.hut.fi/u/jkorpela/ or http://yucca.hut.fi/yucca.html
Received on Friday, 17 December 1999 05:30:56 UTC