Re: WebTV Help for Getting Engaged in W3C Test Effort from James Graham on 2014-04-30 (public-web-and-tv@w3.org from April 2014)

From: James Graham <james@hoppipolla.co.uk>
Date: Wed, 30 Apr 2014 15:07:24 +0100
To: public-test-infra@w3.org, "public-web-and-tv@w3.org" <public-web-and-tv@w3.org>, "public-test-infra@w3.org" <public-test-infra@w3.org>
Message-ID: <5361039C.5020409@hoppipolla.co.uk>

On 30/04/14 14:24, Robin Berjon wrote:
> I *can* however think of ways in which the IDs could be maintained
> automatically in a third-party system. IIRC testharness expressed
> unhappiness when two test cases inside a given file have the same test
> name. This means that at a given commit, the {file name, test name}
> tuple is unique: an ID can be assigned to it. A database tracking the
> repository can then:
>
>    1) Track git moves (as well as removals and additions) in order to
> maintain the identifier when the file part changes.
>    2) Track addition and removal of test names per file with on-commit
> runs. (There is infrastructure that should make it possible to extract
> all the test names easily, including generated ones — we can look at the
> details if you decide to go down that path.)

So, FWIW we have not dissimilar requirements; we want to track which 
tests we are expected to pass, which we are expected to fail, and which 
have some other behaviour. At the moment the way we do that is to 
identify each test with a (test_url, test_name) tuple, much like Robin 
suggested. Then we generate a set of files with the expected results 
corresponding to each test. These are checked in to the source tree so 
they can be versioned with the code, and when people fix bugs they are 
expected to update the expected results (hence the use of a plain text 
format rather than a database or something).

When we import a new snapshot of the test database (which is expected to 
be as often as possible), we regenerate the metadata using a build of 
the browser that got the "expected" results on the old snapshot. In 
principle it warns when the a result changed without the file having 
changed between snapshots. Obviously there are ways that this system 
could fail and there would be ways to track more metadata that could 
make it more robust; for example we could deal with renames rather than 
mapping renames to new tests. However in the spirit of YAGNI those 
things will be fixed if they become pain points.

(apologies for the slightly mixed tense; this system is in the process 
of being finished and deployed).

>> 2) Ability to define a precise subset of W3C tests, covering areas of
>> particular interest to that organisation and that can be reasonably
>> expected to be passed 100% on all compliant devices. In practice this
>> probably involves selecting only tests that pass on a majority of
>> desktop browsers. See [1] and [2] for more background on why this is
>> needed. One obvious way to define a subset is for the organisation to
>> maintain their own list/manifest of test IDs; another is to allow the
>> organisation to redistribute a subset of W3C tests (I'm not sufficiently
>> familiar with the W3C test license terms to know whether this is
>> possible).
>
> We generate a manifest of all test files; it should not be hard to
> subset it. In fact our test runner uses it to support crude (but useful)
> subsetting of the test suite already so that we can run just some parts.

FWIW the wptrunner code that we are using supports subsetting in a few ways:

1) Specific test paths may be selected on the command line using 
something like --include=dom/ to only run tests under /dom/.

2) An "include manifest" file may be specified on the command line to 
run only certain test urls. For example a file with the text:

"""
skip: True

[dom]
   skip: False
   [ranges]
     skip: True
"""

Would run just the tests under /dom/ but nothing under /dom/ranges/

3) Individual test urls or subtests may be disabled in the expectation 
manifest files described above. In the case of urls this prevents the 
url being loaded at all. In the case of specific tests it merely causes 
the result to be ignored.

Received on Wednesday, 30 April 2014 14:07:49 UTC