automated markup validator test suite

Hello,

I have been hacking on automating the test suite for the markup  
validator. There have been tests for the validator for many years (http://validator.w3.org/dev/tests/ 
  - although we most often use the tests on the dev instance, updated  
in real time: http://qa-dev.w3.org/wmvs/HEAD/dev/tests/ ) but they had  
to be checked by hand.

http://dev.w3.org/cvsweb/validator/misc/testsuite/

The test harness is inspired by the existing work - in java - by Jean- 
Guilhem Rouel for the CSS validator.
http://dev.w3.org/cvsweb/2002/css-validator/autotest/
It uses the python unit testing framework, and requires python 2.5  
(with cElementTree) and the jinja2 template engine to run. It is also  
easy to extend to run against other types of validators: if anybody is  
interested in adding wrappers for validator.nu, validome or oter tools  
I'd be happy to help.

The harness has been fed with all the machine-verifiable tests from  
the "manual" test suite. This should include tests for all supported  
document types and regression tests for most bugs fixed and/or  
reported in recent times. It's easy to extend, so if anyone is  
interested in adding a battery of tests, ping me.

At the moment the main "issue" with the test suite is that it has no  
clear way to document "yes, the validator has a bug but it's OK".

Below are the (annotated) results of the tests the current development  
version of the validator (overdue for a release…) does not pass:

* FAIL:  (control) test for validation of SVG 1.1 Basic with Doctype
* FAIL:  Test for validation of Doctype-less SVG 1.1 Basic

These two are due to errors in the SVG 1.1 Basic DTDs,
not a bug in the validator but annoying
  http://lists.w3.org/Archives/Public/www-svg/2007May/thread.html#msg10

* FAIL:  Test for warning about ampersand as data (in SGML)

Seems like a regression on bug
http://www.w3.org/Bugs/Public/show_bug.cgi?id=798
This should be looked at but is not a showstopper.


* FAIL:  Test of warning for non-HTML compatible XHTML document (C1)
* FAIL:  Test of warning for non-HTML compatible XHTML document (C2)
* FAIL:  Test of warning for non-HTML compatible XHTML document (C3)

Addition of the "html compatibility" checking postponed for now,  
awaiting the completion of the updated guidelines.

* FAIL:  bogus FPI #2: HTML 4.01 "Strict"


* FAIL:  DOCTYPE with a relative URI for the system identifier. Should  
probably pass if the sgml parser was given the base URI(?)

This is http://www.w3.org/Bugs/Public/show_bug.cgi?id=1521 - the bug  
is waiting for someone to adopt it.

* FAIL:  text/html, no charset, fbc set (W02)
* FAIL:  text/html, no charset, override set (W04)
* FAIL:  text/html, no charset (W04)

Bizarre. Looks like the recently upgraded apache 2.2.9 doesn't like  
"AddDefaultCharset Off" and "RemoveCharset .html" any more, so the  
tests are broken. I'm not aware of any recent change in charset  
handling in the validator, so not a showstopper for a release.

* FAIL:  Test for xmlns in HTML content

The preparse warning is not shown, but since the actual error shows,  
that's not a showstopper.

-- 
olivier

Received on Thursday, 31 July 2008 20:41:02 UTC