Re: Automated Test Runner from Aryeh Gregor on 2011-02-18 (public-html-testsuite@w3.org from February 2011)

From: Aryeh Gregor <Simetrical+w3c@gmail.com>
Date: Fri, 18 Feb 2011 15:05:12 -0500
To: "L. David Baron" <dbaron@dbaron.org>
Cc: James Graham <jgraham@opera.com>, Kris Krueger <krisk@microsoft.com>, Anne van Kesteren <annevk@opera.com>, "public-html-testsuite@w3.org" <public-html-testsuite@w3.org>, "Jonas Sicking (jonas@sicking.cc)" <jonas@sicking.cc>
Message-ID: <AANLkTin0DCY-zXUy826755U8Mrm+bcMn+pA_4jCFCOGm@mail.gmail.com>

On Fri, Feb 18, 2011 at 12:11 PM, L. David Baron <dbaron@dbaron.org> wrote:
> Two things solve the problem of a test unexpectedly terminating
> without actually finishing:
>
>  (1) the harness goes on to the next test when the current test
>  tells the harness it that it is finished, so if the test never says
>  it's finished, the run stops.  (And this is needed anyway to run
>  anywhere close to efficiently; alloting tests a fixed amount of
>  time is a huge waste of time.)
>
>  (2) an onerror handler catches uncaught exceptions or script parse
>  errors, counts them as a failure, and goes on.

A bug in the test could cause it to run a different number of tests on
different runs.  I know this happens with my reflection tests for a
fact, depending on which test harness you use -- a thrown exception
can cause some tests to be never be run, for instance, when they
should actually be failed.  I'm planning to fix this by adjusting my
try/catch blocks, but there could still be bugs.

This isn't an issue for Mozilla's regression tests, because you expect
zero fails per file and therefore you don't care about what happens on
a failure as long as the file is reported failed.  But in
cross-browser conformance tests, we can't mark known fails as todo(),
so we need the tests to run all tests in each file even if some of
them fail.  Otherwise, one test failure will mask all further test
failures in the file.

Of course, maybe it's okay for different test runs to run different
numbers of tests.  But it's not okay if we're going to publish pass
percentages for different browsers, because then fixing a failure
might decrease the pass percentage if it opens up new failures, or
conversely causing a new failure might increase the pass percentage.
IMO, we should publish pass percentages for different browsers for any
sufficiently complete part of the test suite, to encourage them to
compete on getting to 100% conformance.  But for that to work, fixing
failures needs to consistently increase your pass percentage, and that
might not happen if it can change the number of tests that run.

Received on Friday, 18 February 2011 20:06:09 UTC