- From: Andy Hickman <andy.hickman@digitaltv-labs.com>
- Date: Thu, 01 May 2014 21:57:26 +0100
- To: James Graham <james@hoppipolla.co.uk>, public-test-infra@w3.org, "public-web-and-tv@w3.org" <public-web-and-tv@w3.org>
Robin, James, Thanks for your insights and suggestions Robin - thanks for clarifying the distinction between test files and test cases. I think we all agree that it is the test that is the interesting entity. HbbTV's approach has a test per test suite directory, with each directory containing the HTML app (i.e. test cases) with its various supporting files (streams, images, JS, XML metadata, etc.) so there is a 1:to:1 map between the test case HTML file and the test case. There are pros and cons to this approach - I'm not particularly advocating it. As you correctly surmise my interest is robustly identifying precise test cases. There are situations in the TV world where an individual tests gets challenged and a lot is at stake (e.g. a manufacturing line is waiting while a trademark authority determines whether a manufacturer should be granted a test report pass to use a logo based upon a manufacturer's claim that a particular test is not correct and should be waived). Identifying tests extremely robustly over multiple versions of a test suite is an absolute imperative. When tests get waived the certification authority usually wishes to only remove the incorrect test; not the whole file containing (potentially many) other perfectly valid tests as that would unnecessarily weaken the test regime. I hope this makes the use case a bit clearer - I'm still hoping this could be achieved within the framework you've got. Please see also inlines. Thanks, Andy On 30/04/2014 15:07, James Graham wrote: > On 30/04/14 14:24, Robin Berjon wrote: >> I *can* however think of ways in which the IDs could be maintained >> automatically in a third-party system. IIRC testharness expressed >> unhappiness when two test cases inside a given file have the same test >> name. This means that at a given commit, the {file name, test name} >> tuple is unique: an ID can be assigned to it. A database tracking the >> repository can then: >> >> 1) Track git moves (as well as removals and additions) in order to >> maintain the identifier when the file part changes. >> 2) Track addition and removal of test names per file with on-commit >> runs. (There is infrastructure that should make it possible to extract >> all the test names easily, including generated ones — we can look at the >> details if you decide to go down that path.) > > So, FWIW we have not dissimilar requirements; we want to track which > tests we are expected to pass, which we are expected to fail, and > which have some other behaviour. At the moment the way we do that is > to identify each test with a (test_url, test_name) tuple, much like > Robin suggested. Then we generate a set of files with the expected > results corresponding to each test. These are checked in to the source > tree so they can be versioned with the code, and when people fix bugs > they are expected to update the expected results (hence the use of a > plain text format rather than a database or something). > > When we import a new snapshot of the test database (which is expected > to be as often as possible), we regenerate the metadata using a build > of the browser that got the "expected" results on the old snapshot. In > principle it warns when the a result changed without the file having > changed between snapshots. Obviously there are ways that this system > could fail and there would be ways to track more metadata that could > make it more robust; for example we could deal with renames rather > than mapping renames to new tests. However in the spirit of YAGNI > those things will be fixed if they become pain points. > > (apologies for the slightly mixed tense; this system is in the process > of being finished and deployed). > Apologies if I'm missing something but the tuple tracking suggestion seems a pretty complex and potentially brittle solution to something that could be fairly trivially solved (if there wasn't a huge legacy of test cases...). In RDBMS terms, let's take the example of trying to be able to reliably identify a record in a table over time. Sure you could use two columns whose values can change (e.g. to correct typos) and form an ID out of the tuple of the two column values, track changes to those tuple values over time, and then separately hold a map of generated ID to current tuple elsewhere.... Or you could just have a column which contains a unique, unchanging ID for that record. My mental analogy is that we're designing a database table to store people details and you guys are suggesting using a "forename", "surname", "date of birth" tuple plus some clever mechanisms to ensure that this info remains unique and that changes are tracked, whereas the usual RDBMS design pattern would be to have a unique ID index column on the original table. My analogy is probably wrong, but I'd be grateful if you could explain why! Would it be fair to say that supporting unique test IDs wasn't a design requirement when the harness/runner framework was put together and now we are where we are it's easier to use the suggested approach than to assign unique test IDs and have to retrofit them to thousands of test cases? BTW, I do have manually allocation of test IDs in my mind, which I know will be unpopular. In the overall scheme of designing and authoring valid test code this is a tiny overhead (albeit a big one off task when multiplied a few thousand times...). The point raised about your auto-generated tests may well be a more substantive issue. One other thing: it wasn't clear to me how your proposal would work is a test name is changed? >>> 2) Ability to define a precise subset of W3C tests, covering areas of >>> particular interest to that organisation and that can be reasonably >>> expected to be passed 100% on all compliant devices. In practice this >>> probably involves selecting only tests that pass on a majority of >>> desktop browsers. See [1] and [2] for more background on why this is >>> needed. One obvious way to define a subset is for the organisation to >>> maintain their own list/manifest of test IDs; another is to allow the >>> organisation to redistribute a subset of W3C tests (I'm not >>> sufficiently >>> familiar with the W3C test license terms to know whether this is >>> possible). >> >> We generate a manifest of all test files; it should not be hard to >> subset it. In fact our test runner uses it to support crude (but useful) >> subsetting of the test suite already so that we can run just some parts. > > FWIW the wptrunner code that we are using supports subsetting in a few > ways: > > 1) Specific test paths may be selected on the command line using > something like --include=dom/ to only run tests under /dom/. > > 2) An "include manifest" file may be specified on the command line to > run only certain test urls. For example a file with the text: > > """ > skip: True > > [dom] > skip: False > [ranges] > skip: True > """ > > Would run just the tests under /dom/ but nothing under /dom/ranges/ > > 3) Individual test urls or subtests may be disabled in the expectation > manifest files described above. In the case of urls this prevents the > url being loaded at all. In the case of specific tests it merely > causes the result to be ignored. > > The subsetting approaches sound OK. I'm sure something workable for a third party organisation to define the tests that are relevant to them could be arrived at.
Received on Thursday, 1 May 2014 20:57:53 UTC