Re: Number of tests in a test suite from Francois Daoust on 2011-08-16 (public-test-infra@w3.org from July to September 2011)

From: Francois Daoust <fd@w3.org>
Date: Tue, 16 Aug 2011 12:22:00 +0200
To: "Linss, Peter" <peter.linss@hp.com>
CC: Philippe Le Hegaret <plh@w3.org>, "Michael(tm) Smith" <mike@w3.org>, public-test-infra <public-test-infra@w3.org>
Message-ID: <4E4A44C8.6080309@w3.org>
On 08/10/2011 10:03 PM, Linss, Peter wrote:
> On Aug 10, 2011, at 5:16 AM, Francois Daoust wrote:
>
>> On 08/01/2011 09:34 PM, Linss, Peter wrote:
>>> On Aug 1, 2011, at 7:19 AM, Francois Daoust wrote:
>> [...]
>>>> 3/ a combination of both 1/ and 2/, e.g. counting one test per test file by default but making it possible (through some metadata flag) to count all sub-tests.
>>>>   - benefit: authors (or working groups) get the choice to write tests the way they prefer.
>>>>   - drawback: slightly more complicated to implement from a test runner perspective.
>>>>
>>>>  From what you already said, there are test files with thousands of sub-tests that the group wants to see in the results, so 1/ is off the table. Should we aim for 3 or stick to 2?
>>
>> Aiming for 3 after discussion on Monday.
>>
>>
>>> Another twist here, the CSS test suite and the harness code currently have the concept of combination tests. The premise being that you have a number of sub tests, which each test one testable assertion, and a single combination test, which is a single test that, by definition, tests all the assertions of the sub tests in one go. If a UA passes all the sub tests, it can be presumed to pass the combination test, and vice versa.
>>>
>>> One way to deal with having multiple tests per file is to treat the file as a combination test and all the tests within it as sub-tests. This really only makes sense if the file only contains tests that are closely related (i.e. all test the same section(s) of a spec). I'm not sure if that's the current practice in other suites (if it isn't, I highly recommend it).
>>
>> Yes, I would expect tests to be closely related as well.
>>
>>
>>> This was my plan to adapt the current harness code to the concept of multiple tests per file. For reporting purposes each sub test would be reported individually, if all sub tests have the same result for a UA, the harness can collapse the results.
>>
>> Sounds like a good approach.
>>
>> The "combo" feature currently takes for granted that the subtests are known a priori, i.e. imported in the "testcases" table. For test cases that use testharness.js, it's going to be hard to extract that information automatically (embedded in JavaScript, possibly with asynchronous subtests defined in nested contexts), and tedious to require a manual extraction.
>>
>> For typical combo script tests:
>> - we will know that the test file is a combo test, through the "combo" metadata flag
>> - we will know the number of subtests it contains, either a priori through a metadata flag (more a key/value pair than a flag, actually), or a posteriori when the test is run, provided the script reports it.
>
> If this data is available in the test metadata then there's no reason it can't be known to the harness a priori. We (the CSSWG) have test suite build code that generates the testcases.data file automatically. I've been working on that code to refactor it and make it more generic. It also builds nice index pages of the test suites and does some degree of validation of the tests so there's general utility in using it for all suites.

I don't see how one can extract subtests from a script test automatically unless we put constraints on the way these tests are written, but I'd be more than happy to be wrong. Many tests have conditional subtests that only get added and run when e.g. a first subtest passes.

A constraint that would work: provided no error occurs, all subtests must be run when the test runs, no matter whether subtests pass or fail. If that constraint is respected, running the test once would allow to extract information about subtests through a simple script, and prepare the required input file to be imported into the harness.

I'm not sure that is reasonable though. Some subtests could be triggered by event firing, and the event might not fire in some implementations for some reason.


>> In both cases, the test author needs to be explicit about the number of subtests.
>> - we won't have the list of subtests during the importation step.
>> - when the test is run, we will know the number of subtests that passed/failed
>> - when the test is run, we may know the individual pass/fail status of subtests when the test is run, each subtest being identified by a name set by the script. I say "may" because we cannot expect the user to enter that information manually (e.g. when reporting for a different user-agent).
>>
>> Perhaps the easiest way to proceed is not to worry too much about the information we put in the "testcases" table and focus on what we store in the "results" table.
>>
>> Possible changes on the front end:
>> 1/ For combo script tests, add text input fields to result submission form that ask the user to report on the number of subtests that passed/failed and the total number of subtests (when unknown, i.e. unless the test case already exposes that information somehow).
>> 2/ Fill out these input fields automatically with a little bit of JavaScript. If the total number of subtests is not declared in the markup, testharness.js will need to be updated to expose that information.
>> 3/ Automatically append information about individual subtests as form submission data
>> 4/ Adjust the results page to provide useful information about combo tests.
>>
>> On the back end, there are multiple ways to adjust what gets stored in the "results" table. Without having thought too much about it, I would add "nb_passed", "nb_failed", "nb_uncertain", "nb_skipped" columns to ease stats computing, a "parent_id" column set when the row is that of a subtest to link back to the test row, and a "subtest_name" column to store the name of the subtest.
>>
>> Does that match the approach you had in mind, Peter?
>
> I haven't given it enough thought to give a definitive answer here, but I think we're asking for trouble if we don't let the harness know about the subtests as individual tests. As I said above, that information can be extracted from the tests in an automatic process with mostly existing code (or soon to be existing code), so I'm not sure it's worth adding new concepts to the harness code to deal with unknown subtests...

An example that got raised is a test containing automatically generated subtests such as:
  http://w3c-test.org/html/tests/submission/AryehGregor/reflection/reflection-metadata.html

Francois.
Received on Tuesday, 16 August 2011 10:22:33 UTC