Number of tests in a test suite (was: Re: Framework next steps) from Francois Daoust on 2011-08-01 (public-test-infra@w3.org from July to September 2011)

From: Francois Daoust <fd@w3.org>
Date: Mon, 01 Aug 2011 16:19:58 +0200
To: Philippe Le Hegaret <plh@w3.org>
CC: "Michael(tm) Smith" <mike@w3.org>, public-test-infra <public-test-infra@w3.org>
Message-ID: <4E36B60E.3020100@w3.org>

On 07/22/2011 03:50 PM, Philippe Le Hegaret wrote:
[...]
> Looking at
> http://w3c-test.org/framework/test/nav-timing-default/test_document_open/
>
> I see a number of issues:
>
> - number of tests reported: the test itself is problematic since it
> doesn't report how many tests it contains and are failing. It contains
> 21 tests in fact. At the minimum, we need to tweak the test.

I'd like to clarify this point. On top of the framework, it impacts the way we'll want people to write tests.

A test file that uses "testharness.js" may contain one or more actual sub-tests. There are three ways to count the number of tests in a test suite:

1/ one test per test file, as done right now
  - benefit: it's easy to count tests that pass and/or fail as there's a 1-to-1 correspondence between test files and tests. That's what is currently done with other types of tests, e.g. in the CSS test suite.
  - drawback: it hides potentially useful details in test reports and forces authors to write separate tests for different functionalities, even though these functionalities are closely related and could be tested together in theory.

2/ counting all sub-tests as tests.
  - benefit: it gives a more detailed view of what passes and what fails.
  - drawback: for consistency purpose, it seems highly preferable to ensure that harness reports the same number of tests no matter what. This forces authors to write tests consequently. In the example you provided (leaving aside the "for" loop for the time being), the test starts with a generic test that ensures "window.performance" is defined. If it's not, the test fails and none of the sub-tests gets checked, leading to a "1 test failed" report. It would be more valuable to get a "21 tests failed" (or at least to be able to tell that 20 tests could not be run), otherwise we would not be able to compare test reports in any meaningful way.

3/ a combination of both 1/ and 2/, e.g. counting one test per test file by default but making it possible (through some metadata flag) to count all sub-tests.
  - benefit: authors (or working groups) get the choice to write tests the way they prefer.
  - drawback: slightly more complicated to implement from a test runner perspective.

 From what you already said, there are test files with thousands of sub-tests that the group wants to see in the results, so 1/ is off the table. Should we aim for 3 or stick to 2?

Francois.

Received on Monday, 1 August 2011 14:20:30 UTC