DOM test suite report, first pass from Robin Berjon on 2014-03-31 (public-html@w3.org from March 2014)

From: Robin Berjon <robin@w3.org>
Date: Mon, 31 Mar 2014 14:21:38 +0200
To: "HTML WG (public-html@w3.org)" <public-html@w3.org>
Message-ID: <53395DD2.7050205@w3.org>
Hi all,

I've some hacking, I've finally managed to produce a test suite report 
for DOM 4.

With the strong increase in testing at W3C, we ran into a novel set of 
problems (that I'm happy we have). The first is that running the test 
suite takes a bit of time, and more annoyingly that its output produces 
more data than browsers are used to handling. This made it tricky to 
actually get the data out of the browsers.

The other is that the results table is too long to be processable by 
normal human beings, even seasoned standardistas. As a result, I 
produced a table of the tests that don't make the cut of passing in at 
least two implementations. It's likely that we'll want other types of 
information, but this is likely the most important bit for exit 
considerations.

You can see the results here:

     http://w3c.github.io/dom/test-results/less-than-2.html

At first brush, they don't look very good. A full 16% of our (unit) 
tests don't pass in two implementations.

Having said that, there are several mitigating factors:

   • Even just a cursory pass through the table will show that the cases 
in which only one browser succeeds are inordinately much cases in which 
Firefox is the passing browser. I'm more than willing to consider that 
Gecko has the best DOM implementation of them all, but what bothers me 
is that the vast majority of those tests has been written by people 
working for or otherwise linked to Mozilla. Note that I do *NOT* in the 
least suspect foul play. But you're less likely to find issues with your 
tests if they pass in the browser you're using. I therefore invite other 
implementers (or anyone) to check those tests; you can run them here: 
http://w3c-test.org/tools/runner/index.html (enter "/dom/" for the path, 
if you're running IE you probably won't succeed in getting the JSON 
unless you check the "Dump JSON" option — you'll get a textarea with the 
output instead of a download).

   • We need to go through the list with a fine-toothed comb, but it's 
clear that some tests, while valid, shouldn't be there. For instance, 
ProgressEvent isn't defined in the DOM. The "historical" tests check for 
removal of some parts of the DOM that are being investigated for 
removal, but even the spec says that it is not yet clear if they should 
be removed. I think it's asking too much that these be removed. The 
"interface" and "exceptions" failures are in fact WebIDL failures. There 
needs to be more work on the test suite to make it more correct in these 
aspects.

   • In some cases, test results are "undefined". For this test suite, 
these are generally suspicious. A small number of those can be ascribed 
to the JSON coming out of IE being somewhat corrupted in its encoding 
(which generates test names that don't exist in the results of other 
browsers). These need investigation.

   • Of the 16% of subtests that don't have two passes, a whooping 15 
percentage points (that is, 93% of all failures) come from Range tests. 
While it is no secret that Range isn't the most interoperable part of 
the Web platform, that seems high.

   • What's more, of the previous 15 percentage points, about 13 are 
tests returning "undefined" for IE due to test timeouts. I find this 
unlikely. It could be an artefact of my IE test being run on an old, 
relatively unpowered Surface. I don't know how many passes IE would add 
here, but I'm investigating trying to get a new batch of results without 
the timeouts at least.

We plan on running the same exercise with the HTML test suite, but it's 
sloooooow. Also, it has a bunch of manual tests, which make running them 
more painful.

-- 
Robin Berjon - http://berjon.com/ - @robinberjon
Received on Monday, 31 March 2014 12:21:48 UTC