Re: Request for Feedback On Test Harness from Aryeh Gregor on 2010-12-01 (public-html-testsuite@w3.org from December 2010)

From: Aryeh Gregor <Simetrical+w3c@gmail.com>
Date: Wed, 1 Dec 2010 15:11:31 -0500
To: Henri Sivonen <hsivonen@iki.fi>
Cc: public-html-testsuite@w3.org, James Graham <jgraham@opera.com>
Message-ID: <AANLkTinOeFxb2+O7gWHmw6dzriubis9EvjN8gt3MUMq6@mail.gmail.com>

On Wed, Dec 1, 2010 at 7:24 AM, Henri Sivonen <hsivonen@iki.fi> wrote:
> I think we should optimize for the ease of writing tests and for finding regressions in a given implementation after an implementation has once fully passed a given test file and the test file has been added to the automated test suite of that implementation. If Firefox passes test_foo.js today and test_foo.js is added to mochitest, it's interesting to keep the file at zero failures. If someone checks in something that makes the test fail, it's not really that important to make a limited number of the assertions in the file fail. It's enough that some assertions fail in order to detect that the checkin or the test was bad.

However, sometimes it's convenient to have a lot of related tests in
one file, and you want to know exactly which tests fail even if it
will be a long time before anyone fully passes.  E.g., my reflection
test is one big file, but no one is going to pass the whole thing
anytime soon.  Breaking it up into one file per attribute would be an
awful lot of small files, and many of them would still have at least
one failure even if they mostly pass.

AFAICT, Mozilla gets around this in practice by marking some tests
inline as expected fails.  That way if you fail some things in the
file, you only have to mark those as expected, and then it works fine
for regression testing.  I don't think the official tests will be
optimal for regression testing without this unless they're cut up
unreasonably finely.  But I certainly don't think we should have
inline annotations for expected fails in particular UAs for the
official test suite.

I guess each implementer could avoid this by keeping a database of all
the failure strings -- plus file and line number so they're distinct
-- and the results from the last run.  That way if a test fails that
once passed, it can be flagged as a possible regression, but if a test
has never passed or is manually marked in the database as an expected
fail, it can be treated as an expected fail.  But for this to work
you'd need the test harness itself to output a full list of failures,
not give up on a file if there's an unexpected exception or similar.

I do agree that a primary goal should be helping regression testing,
and that reporting percentages passed is not a good idea.  If we have
official figures on pass rates for various browsers, it should be done
at a very coarse granularity, on the level of whole features.  "We
pass all tests for data-*" is a useful statement, but "We pass 20% of
tests for data-*" vs. "We pass 80% of tests for data-*" is not -- let
alone overall pass percentages.

> I think the test harness should have at least the level of ease that Mochitest has. I think it follows that there shouldn't be a need to wrap the test assertions into any kind of boilerplate.
>
> That is,
> test(function() { assert_true(true, "Message") }, "Dunno what to write here.");
> is a severe ease of test writing failure compared to
> ok(true, "Message");

If that's actually how the current test harness works, I very much
agree.  (I haven't used it yet, but intend to within the next couple
of months.)

Received on Wednesday, 1 December 2010 20:12:25 UTC