Re: WebTV Help for Getting Engaged in W3C Test Effort from Andy Hickman on 2014-05-01 (public-test-infra@w3.org from April to June 2014)

From: Andy Hickman <andy.hickman@digitaltv-labs.com>
Date: Thu, 01 May 2014 21:57:26 +0100
To: James Graham <james@hoppipolla.co.uk>, public-test-infra@w3.org, "public-web-and-tv@w3.org" <public-web-and-tv@w3.org>
Message-ID: <5362B536.2010208@digitaltv-labs.com>
Robin, James,

Thanks for your insights and suggestions

Robin - thanks for clarifying the distinction between test files and 
test cases. I think we all agree that it is the test that is the 
interesting entity. HbbTV's approach has a test per test suite 
directory, with each directory containing the HTML app (i.e. test cases) 
with its various supporting files (streams, images, JS, XML metadata, 
etc.) so there is a 1:to:1 map between the test case HTML file and the 
test case. There are pros and cons to this approach - I'm not 
particularly advocating it. As you correctly surmise my interest is 
robustly identifying precise test cases.

There are situations in the TV world where an individual tests gets 
challenged and a lot is at stake (e.g. a manufacturing line is waiting 
while a trademark authority determines whether a manufacturer should be 
granted a test report pass to use a logo based upon a manufacturer's 
claim that a particular test is not correct and should be waived). 
Identifying tests extremely robustly over multiple versions of a test 
suite is an absolute imperative.

When tests get waived the certification authority usually wishes to only 
remove the incorrect test; not the whole file containing (potentially 
many) other perfectly valid tests as that would unnecessarily weaken the 
test regime.

I hope this makes the use case a bit clearer - I'm still hoping this 
could be achieved within the framework you've got.

Please see also inlines.

Thanks,
Andy

On 30/04/2014 15:07, James Graham wrote:
> On 30/04/14 14:24, Robin Berjon wrote:
>> I *can* however think of ways in which the IDs could be maintained
>> automatically in a third-party system. IIRC testharness expressed
>> unhappiness when two test cases inside a given file have the same test
>> name. This means that at a given commit, the {file name, test name}
>> tuple is unique: an ID can be assigned to it. A database tracking the
>> repository can then:
>>
>>    1) Track git moves (as well as removals and additions) in order to
>> maintain the identifier when the file part changes.
>>    2) Track addition and removal of test names per file with on-commit
>> runs. (There is infrastructure that should make it possible to extract
>> all the test names easily, including generated ones — we can look at the
>> details if you decide to go down that path.)
>
> So, FWIW we have not dissimilar requirements; we want to track which 
> tests we are expected to pass, which we are expected to fail, and 
> which have some other behaviour. At the moment the way we do that is 
> to identify each test with a (test_url, test_name) tuple, much like 
> Robin suggested. Then we generate a set of files with the expected 
> results corresponding to each test. These are checked in to the source 
> tree so they can be versioned with the code, and when people fix bugs 
> they are expected to update the expected results (hence the use of a 
> plain text format rather than a database or something).
>
> When we import a new snapshot of the test database (which is expected 
> to be as often as possible), we regenerate the metadata using a build 
> of the browser that got the "expected" results on the old snapshot. In 
> principle it warns when the a result changed without the file having 
> changed between snapshots. Obviously there are ways that this system 
> could fail and there would be ways to track more metadata that could 
> make it more robust; for example we could deal with renames rather 
> than mapping renames to new tests. However in the spirit of YAGNI 
> those things will be fixed if they become pain points.
>
> (apologies for the slightly mixed tense; this system is in the process 
> of being finished and deployed).
>
Apologies if I'm missing something but the tuple tracking suggestion 
seems a pretty complex and potentially brittle solution to something 
that could be fairly trivially solved (if there wasn't a huge legacy of 
test cases...).

In RDBMS terms, let's take the example of trying to be able to reliably 
identify a record in a table over time. Sure you could use two columns 
whose values can change (e.g. to correct typos) and form an ID out of 
the tuple of the two column values, track changes to those tuple values 
over time, and then separately hold a map of generated ID to current 
tuple elsewhere.... Or you could just have a column which contains a 
unique, unchanging ID for that record.

My mental analogy is that we're designing a database table to store 
people details and you guys are suggesting using a "forename", 
"surname", "date of birth" tuple plus some clever mechanisms to ensure 
that this info remains unique and that changes are tracked, whereas the 
usual RDBMS design pattern would be to have a unique ID index column on 
the original table. My analogy is probably wrong, but I'd be grateful if 
you could explain why!

Would it be fair to say that supporting unique test IDs wasn't a design 
requirement when the harness/runner framework was put together and now 
we are where we are it's easier to use the suggested approach than to 
assign unique test IDs and have to retrofit them to thousands of test cases?

BTW, I do have manually allocation of test IDs in my mind, which I know 
will be unpopular. In the overall scheme of designing and authoring 
valid test code this is a tiny overhead (albeit a big one off task when 
multiplied a few thousand times...). The point raised about your 
auto-generated tests may well be a more substantive issue.

One other thing: it wasn't clear to me how your proposal would work is a 
test name is changed?
>>> 2) Ability to define a precise subset of W3C tests, covering areas of
>>> particular interest to that organisation and that can be reasonably
>>> expected to be passed 100% on all compliant devices. In practice this
>>> probably involves selecting only tests that pass on a majority of
>>> desktop browsers. See [1] and [2] for more background on why this is
>>> needed. One obvious way to define a subset is for the organisation to
>>> maintain their own list/manifest of test IDs; another is to allow the
>>> organisation to redistribute a subset of W3C tests (I'm not 
>>> sufficiently
>>> familiar with the W3C test license terms to know whether this is
>>> possible).
>>
>> We generate a manifest of all test files; it should not be hard to
>> subset it. In fact our test runner uses it to support crude (but useful)
>> subsetting of the test suite already so that we can run just some parts.
>
> FWIW the wptrunner code that we are using supports subsetting in a few 
> ways:
>
> 1) Specific test paths may be selected on the command line using 
> something like --include=dom/ to only run tests under /dom/.
>
> 2) An "include manifest" file may be specified on the command line to 
> run only certain test urls. For example a file with the text:
>
> """
> skip: True
>
> [dom]
>   skip: False
>   [ranges]
>     skip: True
> """
>
> Would run just the tests under /dom/ but nothing under /dom/ranges/
>
> 3) Individual test urls or subtests may be disabled in the expectation 
> manifest files described above. In the case of urls this prevents the 
> url being loaded at all. In the case of specific tests it merely 
> causes the result to be ignored.
>
>
The subsetting approaches sound OK.  I'm sure something workable for a 
third party organisation to define the tests that are relevant to them 
could be arrived at.
Received on Thursday, 1 May 2014 20:57:53 UTC