W3C home > Mailing lists > Public > www-validator@w3.org > November 2002

Re: Nov 26 2002 update possible error

From: Terje Bless <link@pobox.com>
Date: Wed, 27 Nov 2002 01:33:57 +0100
To: W3C Validator <www-validator@w3.org>
cc: Eric Anderson <anderson@cs.uoregon.edu>
Message-ID: <a01060007-1022-EAA69DA8019F11D7B52A00039300CF5C@[193.157.66.10]>

Eric Anderson <anderson@cs.uoregon.edu> wrote:

>As of this current update, a whole set of pages which had been
>validating correctly stopped.  Specifically, I'm getting the following
>error message for all of them:
>
>" I was not able to extract a character encoding labeling from any of
>the valid sources for such information. Without encoding information it
>is impossible to validate the document. The sources I tried are: [...] "
>
>I don't know if this reveals some formerly unknown error in our web
>server, or whether it's a Validator problem.  But it might be the
>latter.
>
>Here's one of the URLs for which this is happening:
>
>http://www.cs.uoregon.edu/~anderson/gtf/cis111/

There is actually several things going on here. Fist of all, you are
serving XML as "text/html" (which is unfortunate, but may be necessary).
Secondly, your server is not sending a "charset" parameter for the
Content-Type field in the HTTP response so we have no clear indication of
what Character Encoding is being used. And finally, your example document
is not Valid XML; it contains a XML Declaration but not as the first thing
in the file. cf.:

<!-- $Id: index.html,v 1.11 2002/11/21 02:24:49 anderson Exp anderson $ -->
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html
[...]


Without the comment we might have been able to use the autodetect algorithm
from Appendix F of the XML Recommendation to find the correct Encoding; and
if you set an explicit character encoding in the HTTP headers there will be
no doubt.

The reason this changed is that the new version of the Validator is more
strict about proper labelling of character encoding to avoid giving false
results.


I would recommend that 1) you move the CVS comment somewhere else and 2)
that you configure your web server to send the proper "charset" parameter.
Depending on what kinds of documents you typically serve, you may be able
to just set a default encoding of "UTF-8" in the main configuration file
for the server.


-- 
If you believe that will stop spammers, you're sadly misled. Rusty hooks,
rectally administered fuel oil enemas, and the gutting of their machines,
*that* stops spammers!                                         -- Saundo
Received on Tuesday, 26 November 2002 19:34:04 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 25 April 2012 12:14:04 GMT