Re: -//W3C/DTD XHTML gives no error in Markup validator 0.80 ? from Marc Gueury on 2007-08-12 (www-validator@w3.org from August 2007)

From: Marc Gueury <mgueury@skynet.be>
Date: Sun, 12 Aug 2007 13:43:18 +0200
To: Marc Gueury <mgueury@skynet.be>, www-validator@w3.org
Message-ID: <46BEF256.4000508@skynet.be>
Hi Olivier, Andreas, and all,

First thanks a lot for answering my previous question about the wrong 
doctype that gives no error.

<!DOCTYPE html PUBLIC "-//W3C/DTD XHTML 1.0 Strict//EN"

Your analyze helped me to go one step further. Unhappily, I have no 
solution yet.

In fact, I am not the owner of http://www.mt-olympus.com/. This is the 
page from one of my users.
I am the author of a Firefox extension called HTML Validator 
(http://users.skynet.be/mgueury/mozilla/) that maybe some I suppose some 
of you knows (since there are more than 500.000 users)
This is a open source, free and it will always be. (And I will even be 
happy to give it but it is another story)

In short, the extension works like this:
- it takes the HTML and the mime/type out of Firefox or Seamonkey memory
- and runs based on your preference,
   - the SGML parser (OpenSP 1.5.2)
   - or Tidy (cvs from 05-May-2007)
   - or both
to get the numbers of errors that this page contains. And it displays 
the result in an icon in the status bar.
- you can also see the list of errors in the HTML source viewer of Firefox.
- this has a lot of advantage seen that because it runs in Firefox 
memory, it can validate page
  - behind a login screen
  - in any type of SSL situation
  - or in a intranet behind a firewall
  - ...
What can not do the online validator easily. And it is a lot faster to 
develop with it.

What I did about one year ago was to read/study the code of the W3c 
validator to try to emulate it in Firefox. And if you have never seen 
this extension working, I would recommend you to try 
<http://users.skynet.be/mgueury/mozilla/download.html>. It is the 1rst 
page I see that gives a different result than the real validator.w3.org.

The reason is because OpenSP lack of proxy support. My extension detects 
the following doctype as a XML file and run opensp 1.5.2 with xml.soc on 
it (like the real validator)

<!DOCTYPE html PUBLIC "-//W3C/DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

The difference, is that my extension can be running behind a proxy. So,

-//W3C/DTD XHTML 1.0 Strict//EN

does not allow me to find the DTD. And I can not get the 
http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd from internet since I 
can not explain to OpenSP to use the proxy settings of Firefox. The real 
validator can make the request because he is running on Internet. But I 
can not do this, I am stuck to make to force OpenSP not to make any 
external HTTP calls else, running the extension behind a firewall would 
be to slow.

If you want to see my real issue (you do not need to do this)
- Use Firefox 2.x or 1.5.x
- Install the extension from here 
http://users.skynet.be/mgueury/mozilla/download.html
- Look the page http://www.mt-olympus.com/ (40 errors)
  - Look the view source to see the list of them
- and compare with the real validator (0 errors)

So here are my questions
--------------------------
1) Is my analysis wrong, and is there really no way to make OpenSP work 
behind a proxy server ?
2) Does one of you have a idea how to solve my problem ? Or is there none.
3) Another independent question just for my curiosity: Because you do 
accept that DTD are taken from internet, do you also accept other DTDs 
that are not written by W3c ??  For example, somebody doing his own 
version of HTML ? (ex HTML + some new tags)

Thanks in advance,

Marc

Ps: Please mail me directly as I am not in the mailing list.


Olivier Thereaux wrote:
> Hi Marc,
>
> On Thu, Aug 02, 2007, Marc Gueury wrote:
>> Hello all,
>>
>> I have a user who has noticed this page is
>> http://www.mt-olympus.com/
>> is valid html strict as reported here:
>> http://validator.w3.org/check?verbose=1 ... pus.com%2F
>> <http://validator.w3.org/check?verbose=1&uri=http%3A%2F%2Fwww.mt-olympus.com%2F>
>
> Yes, it is "valid", though valid _what_ is the key here.
>
>> In the new version of the validator 0.80, there is 0 errors.
>
> Right. There should be at least a warning that the FPI ("-//W3C/DTD
> XHTML 1.0 Strict//EN") does not match the SI,
> (http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd), but as far as
> formal validation is concerned, the document is "valid".
>
>> In the version 0.70, it reported 41 errors,
>
> 41? At this point in time I see only two errors given by the validator
> 0.7.4 on that page.
> http://qa-dev.w3.org/wmvs/0.7.4/check?uri=http%3A%2F%2Fwww.mt-olympus.com%2F
>
> the basic reason was this
>> error in the above file:
>>
>> <!DOCTYPE html PUBLIC "-//W3C/DTD XHTML 1.0 Strict//EN" ...
>> should be
>> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" ...
>> Notice the '/'
>
> It's a tad more complicated than that.
>
> * The FPI is bogus
> * the document is XHTML-ish, but served as text/html (i.e. not served 
> as XML)
>
> The previous version of the validator would see this, wonder "I don't
> know this document type", and use the "classic" HTML parsing mode as a
> default. This is what the warning:
> [[
> The MIME Media Type (text/html) for this document is used to serve both
> SGML and XML based documents, and it is not possible to disambiguate it
> based on the DOCTYPE Declaration in your document. Parsing will continue
> in SGML mode.
> ]] is about.
>
> Because the FPI is unknown, the validator will use the system
> identifier, download the DTD, and validate.
>
> The errors came from the fact that the document is XHTML-ish in nature,
> so some constructs are not OK when parsed as HTML.
>
>
> In the new validator, there is a new mechanism to detect XML-based
> documents if an XML declaration is present. Which is the case for your
> document:
> <?xml version="1.0" encoding="UTF-8"?>
> so the validator 0.8.0 triggers the XML mode, and validates (again,
> since the FPI is bogus... the validator uses the SI) It works, but the i
> document is not "valid XHTML 1.0 Strict", because it does not properly
> declares itself as XHTML 1.0.
>
>> Does one of you know the reason of this change and the logic behind it ?
>
> The validator now understands it has to use XML parsing when it sees an
> XML declaration.
>
> The moral of the story is: never write the DOCTYPE yourself.
> Tools should do that, or copy it from
> http://www.w3.org/QA/2002/04/valid-dtd-list.html
>
Received on Sunday, 12 August 2007 11:43:33 UTC