Notes on validome test suite / validators comparison

Hello,

While the W3C validator has its own (limited, admittedly) test suite,  
it is often useful to test it against some other collections of test  
documents. One such collection is the "test suite" for validome  
(another excellent (X)html validator) which I use now and then.

http://www.validome.org/lang/en/errors/ALL

Unfortunately, this "test suite" has a number of serious shortcomings:
- some tests are actually wrong
- most of the tests are missing references to the spec, making some  
arguable tests hard to justify
- the test results / validators comparison page (linked above) has  
validome "pass" all the tests, compared to other validators  
(including W3C's) failing a large number of tests. This tickles my  
sense of humor to no end...

While using the results of a test suite for marketing purposes may be  
fair game, it is not, however, acceptable when 1) the test suite is  
tweaked to show only the test one's product passes 2) expected test  
results are not justified 3) some of the "pass"es are very dubious  
and 4) the test results are outdated and/or show other tools failing,  
when their current version passes, sometimes better than the  
purportedly perfect one.

That is not to say that I consider the test suite useless or  
dishonest. But it gets close to it unless the tests get justified  
with authoritative references, test results properly dated and tested  
versions mentioned, and obvious bias avoided. In this perspective,  
here are the notes I took while checking the current and dev versions  
of the markup validator. I strongly suggest that the validators  
comparison page and associated tests be updated, fixed, and clarified  
according to these notes. If the validome team cannot keep an up to  
date table of the test results, then this page should be replaced  
with a list of the test cases, but not make false claims about other  
tools.



* http://www.validome.org/out/ena1004
Validome yields a fatal error here, claiming that the document is not  
valid.
This is actually incorrect, the document is valid XML as far as I can  
tell.
W3C Markup validator passes it as valid XML.

* http://www.validome.org/out/ena1005
The comparison page is incorrect. The W3C Markup validator reports  
the error.

* http://www.validome.org/out/ena1007
The comparison page is incorrect. The W3C Markup validator reports  
the error.
Validome also considers the typo in the XML decl as a fatal error,  
while the W3C Markup Validator shows the offending markup and  
proceeds to check the document.

* http://www.validome.org/out/ena1011
<?xml version="1.0" encoding="#"?>
Syntax of encoding in XML decl is bogus.
The comparison page is incorrect. The W3C Markup validator reports  
the error.
Validome also considers the typo in the XML decl as a fatal error,  
while the W3C Markup Validator shows the offending markup and  
proceeds to check the document.

* http://www.validome.org/out/ena1012
<?xml version="1.0" encoding="9ISO-8859-1"?>
Syntax of encoding in XML decl is bogus.
The comparison page is incorrect. The W3C Markup validator reports  
the error.
Validome also considers the typo in the XML prolog as a fatal error,  
while the W3C Markup Validator shows the offending markup and  
proceeds to check the document.

* http://www.validome.org/out/ena1014
(extraneous lang attribute in xml decl)
The comparison page is incorrect. The W3C Markup validator reports  
the error.
Validome does however provide a better error explanation.

* http://www.validome.org/out/ena1015
The comparison page is incorrect. The W3C Markup validator reports  
the error.

* http://www.validome.org/out/ena1017
The comparison page is incorrect. The W3C Markup validator reports  
the error.

* http://www.validome.org/out/ena4010
Now, that one is funny because validome's error message is the one  
that was deemed "Inscrutable" when the W3C's markup validator reports  
it for http://www.validome.org/out/ena4002
Pointing out hard-to-comprehend SGML validation messages is not a bad  
thing, but doing it honestly and consistently, even when validome is  
at fault, would be better...

* http://www.validome.org/out/ena4011
HTML 4.01 document with no system Id.
Validome sends a warning... Not necessary per the spec.
W3C Markup validator passes validation.
Why is W3C validator marked as faulty here? References please?

* http://www.validome.org/out/ena4012
XHTML doctype without system Id, but valid public id.
Validation should report an error (both validators do), but why does  
validome count this as a fatal error?

* http://www.validome.org/out/ena4019
The comparison page is incorrect. The W3C Markup validator has the  
proper behavior here, as do others.

* http://www.validome.org/out/ena4020
The comparison page is incorrect. The W3C Markup validator has the  
proper behavior here, as do others.

* http://www.validome.org/out/ena4021
Validome is faulty here (why a fatal error?), and the comparison page  
doesn't mention it.

* http://www.validome.org/out/ena4023
Validome says valid. OpenSP and W3C Markup validator says not valid.
I'd tend to trust opensp here. The comparison page's claim that  
validome is the only validator doing the right thing is very dubious.

* http://www.validome.org/out/ena4024
Ditto above. The comparison page's claim that validome is the only  
validator doing the right thing is very dubious.

* http://www.validome.org/out/ena2
document served with no http charset, has a BOM and a meta charset  
claiming to be iso-8859-1
Validome detects charset to be utf-8, sends warning about BOM.
W3C validator detects charset to be utf-8, sends warning about BOM.
The comparison page claims that validome passes, w3c validator fails.  
On which grounds, please?

* http://www.validome.org/out/ena8
W3C markup validator uses algorithm for charset detection, finds  
none, uses fallback
Validome uses... exactly the same algorith (to the point of having  
almost the same error message...), finds no charset, yields a fatal  
error.
I'm very curious to know why validome passes and w3c markup validator  
fails here. I think the opposite: validome's taste for fatal error is  
a grave failure in usability.

* http://www.validome.org/out/ena13
The comparison page is incorrect. The W3C Markup validator has the  
proper behavior, and reports the mismatch, as far as I can tell.

* http://www.validome.org/out/ena14
The comparison page is incorrect. The W3C Markup validator has the  
proper behavior, and reports the mismatch, as far as I can tell.

* http://www.validome.org/out/ena2002
text/xml document with no charset at http level. W3C Markup validator  
properly follows the RFC and validates as us-ascii. Validome  
incorrectly sends a fatal error. Note to validome developers: "This  
Document is not valid." and "fatal error" are plain wrong, here. If  
you have a separate validator that does XML, don't mislead people  
into thinking that their document is invalid, instead, why not  
directly redirecting them to that specific validator?

* http://www.validome.org/out/ena2008
  - this test is bogus, or the claimed rule "If HTTP-Header charset  
encoding is missing, but there is one in XML-Declaration, a Meta  
charset encoding statement must exist."
  - the comparison page claims that validome alone behaves properly,  
when it actually behaves just like the others, that is, not  
respecting the bogus rule claimed by the test.

* http://www.validome.org/out/ena2009
  - this test is bogus, or the claimed rule "If HTTP-Header charset  
encoding is missing, but  Meta-Tag charset encoding statement exists,  
then there must be also a XML-Declaration charset encoding statement"  
needs a serious reference.
  - validome reports that no encoding was found, and used a fallback.  
This is not correct - there is a meta charset info.
  - The comparison page is incorrect - the w3c markup validator is  
having the perfectly proper behavior here.

* http://www.validome.org/out/ena2010
  - this test is bogus. "If there is a charset encoding statement in  
XML-Declaration as well as in a Meta-Tag,  the XML-Declaration  
charset encoding will be used. HTTP-Header charset encoding is  
irrelevant in this case." is just untrue. HTTP charset info always  
gets precedence, and is never "irrelevant".
  - validome's behavior is incorrect, yet reported as correct
  - The comparison page is incorrect. The W3C Markup validator has  
the proper behavior here.

* http://www.validome.org/out/ena2041
The comparison page is incorrect. The W3C Markup validator has the  
proper behavior here.

* http://www.validome.org/out/ena5006 (ditto 5007 5008 5009 5010 5011  
2025 5026 5027 5028)
I strongly disagree that the W3C Markup's validator behavior is  
incorrect, here. It follows the rules of HTTP precedence closely, and  
reports the discrepancy between doctype and media type.

* http://www.validome.org/out/ena5020
I strongly disagree that the W3C Markup's validator behavior is  
incorrect, here.
text/html is allowed for XHTML 1.0

* http://www.validome.org/out/ena5021
The comparison page is incorrect.
Validome and W3C Markup Validator both mention that XHTML1.1 should  
not be served as text/html.

* http://www.validome.org/out/ena5030
The comparison page is incorrect: it claims that the W3C validator  
does not explain why it parses as SGML (it does). The claim that  
validome is doing the right thing is also dubious, as validome is  
actually not mentioning any problem in parsing mode.

* http://www.validome.org/out/ena6030
The comparison page is incorrect. The W3C markup validator not only  
checks for the presence of xmlns in XHTML, it also give an example of  
what it should look like, and reference to the spec. Validome doesn't.


* http://www.validome.org/out/ena7003
I'd like to see a reference for this.

* http://www.validome.org/out/ena7005 (and 7006)
This has nothing to do with validation. If validome emulates some of  
the features of a link checker, compare it to link checkers, not  
validator. This test is moot.

* http://www.validome.org/out/ena3002
This test is bogus. Sorry. An XML declaration also happens to be a  
proper SGML PI. Giving a warning asking the HTML4 author "are you  
sure you want this here" may be a good idea. Making this a fatal  
error is wrong, wrong, wrong.

* http://www.validome.org/out/ena3006
The comparison page is incorrect. Output of a warning for a shorttag  
construct is a good thing (dev version of w3c validator actually does  
it) but not required. The current W3C Validator's behavior is not wrong.

* http://www.validome.org/out/ena3007
ditto. Learn about shorttags. Validome is actually wrong here, this  
should not be reported as an error, at most a warning.


HTH,
olivier
-- 
olivier Thereaux - W3C - http://www.w3.org/People/olivier/
W3C Open Source Software: http://www.w3.org/Status

Received on Wednesday, 26 September 2007 06:09:23 UTC