Re: The message about character encoding info mismatch

[ Apologies for the horrible format of this message. ]
[ I'm stuck using MS Outlook while at work. :-(      ]

On 19991217T1131, Jukka Korpela <jkorpela@cc.hut.fi> wrote:

>It seems that if a server sends "Content-Type: text/html"
>without any charset parameter and the document contains a
><meta http-equiv="Content-Type" content="..."> element,
>the validator reports a mismatch. Example:
>
>  Character encoding: iso-8859-1 The character encoding specified in the
>  HTTP header ("") is different from the one specified in the META element
>  ("iso-8859-1"). I will use "iso-8859-1" for this validation. 
>
>This seems to cause confusion, and it's a pretty common situation (for
>strange reasons, but that's a different story).

Ok, mea culpa, this is my breakage. It got introduced with the recent File
Upload patch (what's the story on that BTW, Gerald?). What it is reporting
isn't really wrong per se, but I agree it's both confusing and sub-optimal.
The warning tells you that there is a difference between the charset as
provided in the HTTP Content-Type header field and that given in a HTML META
element attribute. It's a mismatch, it just isn't meaningfull to report it
as such in this case.

For this particular case, it's actually wrong as well since ISO-8859-1 is
the default if no explicit charset is given in the HTTP header, but in
general it is correct; just a bit confusing.

I'll throw together some code to fix this, but unfortunately I'm a bit
presed for time these days (as we count down to Y2K) so it'll probably not
get done until early January (and then it's the wait until Gerald has time
to review it for inclusion) if Gerald doesn't get around to it before.


>First I'd suggest making such messages more informative, without making
>them essentially longer (visibly) [...] (Or is that too technical? It
>would be better to link to a document for a wide audience explaining
>things in plain English and then pointing to the technical details.)

Yes, that is one goal. Make error message more coherent, informative, and
consistent[0]. They should also all link to a local document explaining it
(like the excellent one at the WDG HTML Validation Service) which in turn
contains links to relevant technical references, including the documentation
for the W3C HTML Validation Service[1].


>Second, in this particular case it seems that the validator gives
>a bit misleading information, probably because the code for checking
>the presence of charset attribute does not work. The relevant portion
>(in v. 1.56) seems to be:
>
>if ($File->{HTTP_Charset} ne $File->{META_Charset}
>    and $File->{META_Charset} ne ''
>    and $File->{Charset} ne 'unknown') {
>
>(followed by code for printing the message). And this doesn't quite
>work, since the test does not exclude the case where HTTP_Charset is
>empty due to total lack of charset attribute in the header.

Yes, that is Yet Another special case that we fail to take into account[2].



DISCLAIMER: The Validator is Gerald's baby and as such he has final say
            on everything pertaining to it. All the above means is that
            that is what _I_ am planning and will submit patches to him
            for.  None of it places any form of obligation on Gerald to
            actualy include it in the Validator if he thinks it's a bad
            idea or he wants to do it a different way.



[0] - That, BTW, includes internationalizing them so interested parties
      can localize them. That effort will likely coincide or depend on
      reworking the code for better customization support of interface
      elements.

[1] - Documentation that I had half-finished before I wiped them out
      during the course of a Red Hat 6.1 installation.
      /me is *not* amused! :-(

[2] - Gerald allready fixed one such in my code; seems this bit needs
      rethinking and fixing. :-(

Received on Friday, 17 December 1999 06:13:21 UTC