Re: -//W3C/DTD XHTML gives no error in Markup validator 0.80 ? from Olivier Thereaux on 2007-08-02 (www-validator@w3.org from August 2007)

From: Olivier Thereaux <ot@w3.org>
Date: Fri, 3 Aug 2007 08:02:42 +0900
To: Marc Gueury <mgueury@skynet.be>
Cc: www-validator@w3.org
Message-ID: <20070802230242.GA31671@w3.mag.keio.ac.jp>

Hi Marc,

On Thu, Aug 02, 2007, Marc Gueury wrote:
> Hello all,
> 
> I have a user who has noticed this page is
> http://www.mt-olympus.com/
> is valid html strict as reported here:
> http://validator.w3.org/check?verbose=1 ... pus.com%2F 
> <http://validator.w3.org/check?verbose=1&uri=http%3A%2F%2Fwww.mt-olympus.com%2F>

Yes, it is "valid", though valid _what_ is the key here.

> In the new version of the validator 0.80, there is 0 errors.

Right. There should be at least a warning that the FPI ("-//W3C/DTD
XHTML 1.0 Strict//EN") does not match the SI,
(http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd), but as far as
formal validation is concerned, the document is "valid".

> In the version 0.70, it reported 41 errors, 

41? At this point in time I see only two errors given by the validator
0.7.4 on that page.
http://qa-dev.w3.org/wmvs/0.7.4/check?uri=http%3A%2F%2Fwww.mt-olympus.com%2F

the basic reason was this 
> error in the above file:
> 
> <!DOCTYPE html PUBLIC "-//W3C/DTD XHTML 1.0 Strict//EN" ...
> should be
> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" ...
> Notice the '/'

It's a tad more complicated than that.

* The FPI is bogus
* the document is XHTML-ish, but served as text/html (i.e. not served as XML)

The previous version of the validator would see this, wonder "I don't
know this document type", and use the "classic" HTML parsing mode as a
default. This is what the warning:
[[
The MIME Media Type (text/html) for this document is used to serve both
SGML and XML based documents, and it is not possible to disambiguate it
based on the DOCTYPE Declaration in your document. Parsing will continue
in SGML mode.
]] is about.

Because the FPI is unknown, the validator will use the system
identifier, download the DTD, and validate. 

The errors came from the fact that the document is XHTML-ish in nature,
so some constructs are not OK when parsed as HTML.

In the new validator, there is a new mechanism to detect XML-based
documents if an XML declaration is present. Which is the case for your
document:
<?xml version="1.0" encoding="UTF-8"?>
so the validator 0.8.0 triggers the XML mode, and validates (again,
since the FPI is bogus... the validator uses the SI) It works, but the i
document is not "valid XHTML 1.0 Strict", because it does not properly
declares itself as XHTML 1.0.

> Does one of you know the reason of this change and the logic behind it ?

The validator now understands it has to use XML parsing when it sees an
XML declaration.

The moral of the story is: never write the DOCTYPE yourself. 
Tools should do that, or copy it from
http://www.w3.org/QA/2002/04/valid-dtd-list.html

-- 
olivier

Received on Thursday, 2 August 2007 23:02:47 UTC