Re: W3C Markup Validator vs Validome and such

tadeusz szewczyk I rebel:art wrote:
> While cleaning up my XHTML strict on my website http://onreact.com I 
> wanted to be very accurate. So I tested it with several validators. I 
> was delighted to find out the W3C Markup Validator did not return any 
> errors after I was done. Then I wanted to make sure and retested with 
> with others like the one at Validome. It says "The Document is not 
> valid XHTML 1.0 Strict" and that I have the following error: 
> "Unexpected char in row 55 and column 111; this char is not allowed 
> within charset (utf-8) that you use."
> 
> Which one is right?

Well, that depends how you look at the issue.  In one respect the W3C
validator is correct, but in another respect, Validome is.

Your site is being served with the Content-Type: text/html.  The meta
element you have used in the file declares the encoding as ISO-8859-1
and, according to the HTML 4.01, this can be used.  The problem is your
document claims to be XHTML and the meta element cannot be used in XHTML
for that purpose.

See this article for a full discussion of the issue.
http://lachy.id.au/log/2006/01/content-type

The reason Validome issues an error is because it is disregarding the
MIME type (text/html) sent by your server, using DOCTYPE sniffing to
determine the file is XHTML, pretending it was served as XML, applying
XML rules and using the default of UTF-8 because there is no XML
declaration to say otherwise.

The reason the W3C validator doesn't issue an error is because it is
obeying the meta element in the document, which is technically correct
for text/html, though very much incorrect for XML.

The W3C validator also uses some DOCTYPE sniffing to determine the file
is XHTML, but it's still applying HTML rules to determine the encoding.
  This is actually one of the problems with the W3Cs support for XHTML,
but it's also a direct result of using the wrong MIME type.

> I prefer not the mess with the charset and if the W3C Markup 
> Validator says it's correct I tend to let it be.

You don't need to change the encoding, ISO-8859-1 is perfectly
acceptable (although UTF-8 is highly recommended).  You do, however,
need to declare the encoding correctly in the HTTP headers.

http://www.w3.org/International/O-HTTP-charset

-- 
Lachlan Hunt
http://lachy.id.au/

Received on Thursday, 23 March 2006 08:44:44 UTC