The message about character encoding info mismatch

(For background of this message, you may wish to read a message
I just sent to comp.infosystems.www.authoring.html Re: Character encoding
Message-ID <3859ffbb.1280295207@news.cs.hut.fi>)

It seems that if a server sends
Content-Type: text/html
without any charset parameter and the document contains a
<meta http-equiv="Content-Type" content="...">
element, the validator reports a mismatch. Example:

  URI: http://www.fi/ 
  Server: Apache/1.3.6 (Unix) PHP/3.0.11 
  Character encoding: iso-8859-1 The character encoding specified in the
  HTTP header ("") is different from the one specified in the META element
  ("iso-8859-1"). I will use "iso-8859-1" for this validation. 

This seems to cause confusion, and it's a pretty common situation (for
strange reasons, but that's a different story).

First I'd suggest making such messages more informative, without making
them essentially longer (visibly): the string "Character encoding"
could be made a link to
http://www.w3.org/TR/REC-html40/charset.html#h-5.2.2
and instead of just "HTTP header" it could say "HTTP header
Content-Type" and that could link to
http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.17
(Or is that too technical? It would be better to link to a document
for a wide audience explaining things in plain English and then pointing
to the technical details.)

Second, in this particular case it seems that the validator gives
a bit misleading information, probably because the code for checking
the presence of charset attribute does not work. The relevant portion
(in v. 1.56) seems to be:

if ($File->{HTTP_Charset} ne $File->{META_Charset}
    and $File->{META_Charset} ne ''
    and $File->{Charset} ne 'unknown') {

(followed by code for printing the message). And this doesn't quite
work, since the test does not exclude the case where HTTP_Charset is
empty due to total lack of charset attribute in the header.

There are of course two options for handling this case: either suppress
the message (I guess that's the current _intent_) or, IMHO better,
issue a different message, e.g.

  Character encoding: iso-8859-1. This is based on the information in a
  META element, since no encoding was specified in the HTTP header
  Content-Type: text/html.

-- 
Yucca, http://www.hut.fi/u/jkorpela/ or http://yucca.hut.fi/yucca.html

Received on Friday, 17 December 1999 05:30:56 UTC