Re: Number of tests in a test suite (was: Re: Framework next steps) from Linss, Peter on 2011-08-01 (public-test-infra@w3.org from July to September 2011)

From: Linss, Peter <peter.linss@hp.com>
Date: Mon, 1 Aug 2011 20:34:39 +0100
To: Francois Daoust <fd@w3.org>
CC: Philippe Le Hegaret <plh@w3.org>, "Michael(tm) Smith" <mike@w3.org>, public-test-infra <public-test-infra@w3.org>
Message-ID: <1B11325C-A867-4F6D-83D7-1DF1F429DD66@hp.com>

On Aug 1, 2011, at 7:19 AM, Francois Daoust wrote:

> On 07/22/2011 03:50 PM, Philippe Le Hegaret wrote:
> [...]
>> Looking at
>> http://w3c-test.org/framework/test/nav-timing-default/test_document_open/
>> 
>> I see a number of issues:
>> 
>> - number of tests reported: the test itself is problematic since it
>> doesn't report how many tests it contains and are failing. It contains
>> 21 tests in fact. At the minimum, we need to tweak the test.
> 
> I'd like to clarify this point. On top of the framework, it impacts the way we'll want people to write tests.
> 
> A test file that uses "testharness.js" may contain one or more actual sub-tests. There are three ways to count the number of tests in a test suite:
> 
> 1/ one test per test file, as done right now
>  - benefit: it's easy to count tests that pass and/or fail as there's a 1-to-1 correspondence between test files and tests. That's what is currently done with other types of tests, e.g. in the CSS test suite.
>  - drawback: it hides potentially useful details in test reports and forces authors to write separate tests for different functionalities, even though these functionalities are closely related and could be tested together in theory.
> 
> 2/ counting all sub-tests as tests.
>  - benefit: it gives a more detailed view of what passes and what fails.
>  - drawback: for consistency purpose, it seems highly preferable to ensure that harness reports the same number of tests no matter what. This forces authors to write tests consequently. In the example you provided (leaving aside the "for" loop for the time being), the test starts with a generic test that ensures "window.performance" is defined. If it's not, the test fails and none of the sub-tests gets checked, leading to a "1 test failed" report. It would be more valuable to get a "21 tests failed" (or at least to be able to tell that 20 tests could not be run), otherwise we would not be able to compare test reports in any meaningful way.
> 
> 3/ a combination of both 1/ and 2/, e.g. counting one test per test file by default but making it possible (through some metadata flag) to count all sub-tests.
>  - benefit: authors (or working groups) get the choice to write tests the way they prefer.
>  - drawback: slightly more complicated to implement from a test runner perspective.
> 
> From what you already said, there are test files with thousands of sub-tests that the group wants to see in the results, so 1/ is off the table. Should we aim for 3 or stick to 2?
> 
> Francois.
> 

Another twist here, the CSS test suite and the harness code currently have the concept of combination tests. The premise being that you have a number of sub tests, which each test one testable assertion, and a single combination test, which is a single test that, by definition, tests all the assertions of the sub tests in one go. If a UA passes all the sub tests, it can be presumed to pass the combination test, and vice versa.

One way to deal with having multiple tests per file is to treat the file as a combination test and all the tests within it as sub-tests. This really only makes sense if the file only contains tests that are closely related (i.e. all test the same section(s) of a spec). I'm not sure if that's the current practice in other suites (if it isn't, I highly recommend it). 

This was my plan to adapt the current harness code to the concept of multiple tests per file. For reporting purposes each sub test would be reported individually, if all sub tests have the same result for a UA, the harness can collapse the results.

Peter

Received on Monday, 1 August 2011 19:36:56 UTC