Re: WebTV Help for Getting Engaged in W3C Test Effort from James Graham on 2014-05-01 (public-test-infra@w3.org from April to June 2014)

From: James Graham <james@hoppipolla.co.uk>
Date: Thu, 01 May 2014 23:20:36 +0100
To: Andy Hickman <andy.hickman@digitaltv-labs.com>, public-test-infra@w3.org, "public-web-and-tv@w3.org" <public-web-and-tv@w3.org>
Message-ID: <5362C8B4.7050002@hoppipolla.co.uk>
On 01/05/14 21:57, Andy Hickman wrote:
> On 30/04/2014 15:07, James Graham wrote:
>> On 30/04/14 14:24, Robin Berjon wrote:
>>> I *can* however think of ways in which the IDs could be maintained
>>> automatically in a third-party system. IIRC testharness expressed
>>> unhappiness when two test cases inside a given file have the same test
>>> name. This means that at a given commit, the {file name, test name}
>>> tuple is unique: an ID can be assigned to it. A database tracking the
>>> repository can then:
>>>
>>>    1) Track git moves (as well as removals and additions) in order to
>>> maintain the identifier when the file part changes.
>>>    2) Track addition and removal of test names per file with on-commit
>>> runs. (There is infrastructure that should make it possible to extract
>>> all the test names easily, including generated ones — we can look at the
>>> details if you decide to go down that path.)
>>
>> So, FWIW we have not dissimilar requirements; we want to track which
>> tests we are expected to pass, which we are expected to fail, and
>> which have some other behaviour. At the moment the way we do that is
>> to identify each test with a (test_url, test_name) tuple, much like
>> Robin suggested. Then we generate a set of files with the expected
>> results corresponding to each test. These are checked in to the source
>> tree so they can be versioned with the code, and when people fix bugs
>> they are expected to update the expected results (hence the use of a
>> plain text format rather than a database or something).
>>
>> When we import a new snapshot of the test database (which is expected
>> to be as often as possible), we regenerate the metadata using a build
>> of the browser that got the "expected" results on the old snapshot. In
>> principle it warns when the a result changed without the file having
>> changed between snapshots. Obviously there are ways that this system
>> could fail and there would be ways to track more metadata that could
>> make it more robust; for example we could deal with renames rather
>> than mapping renames to new tests. However in the spirit of YAGNI
>> those things will be fixed if they become pain points.
>>
>> (apologies for the slightly mixed tense; this system is in the process
>> of being finished and deployed).
>>
> Apologies if I'm missing something but the tuple tracking suggestion
> seems a pretty complex and potentially brittle solution to something
> that could be fairly trivially solved (if there wasn't a huge legacy of
> test cases...).

At least for the use cases I am interested in, I don't think we can do 
better than the tuple suggestion. I will try to explain why below.

> In RDBMS terms, let's take the example of trying to be able to reliably
> identify a record in a table over time. Sure you could use two columns
> whose values can change (e.g. to correct typos) and form an ID out of
> the tuple of the two column values, track changes to those tuple values
> over time, and then separately hold a map of generated ID to current
> tuple elsewhere.... Or you could just have a column which contains a
> unique, unchanging ID for that record.
>
> My mental analogy is that we're designing a database table to store
> people details and you guys are suggesting using a "forename",
> "surname", "date of birth" tuple plus some clever mechanisms to ensure
> that this info remains unique and that changes are tracked, whereas the
> usual RDBMS design pattern would be to have a unique ID index column on
> the original table. My analogy is probably wrong, but I'd be grateful if
> you could explain why!
>

In a database, you typically have very different constraints. For 
example until you are working at huge scale and need to care about 
sharding; the canonical solution to unique ids is "large integer field 
set to autoincrement". The success of that design relies on the fact 
that each insert operation is isolated, so it's always clear what a 
valid unused id is.

In the case of the test system, it's extremely unclear what a valid 
unused id for every new test is; we have multiple simultaneous 
contributions from a large number of parties and no good way of 
coordinating them. It would be totally impractical to have to go through 
each pull request and add a unique number to every test, for example. 
Clearly an autoincrementing integer isn't going to cut it. So there are 
two ways of dealing with this; we either try for globally unique ids, or 
we allow for the possibility of collisions but make them easy to deal with.

If we wanted globally unique ids, it would probably mean using something 
like a random uuid. For example we could give each test a name like 
aec2cf60-d17a-11e3-80c1-cbadd29e6cd4. If we did that for both filenames 
and the names of tests within files we would have a way of identifying 
every test that wasn't prone to collisions. This is quite similar to the 
way that git and other version control systems work under the hood. I 
hope it's obvious that this setup would be awful in practice; people 
would resent the overhead of generating these names and refuse to submit 
tests to the testsuite, any attempt to communicate using the ids would 
be a nightmare, and people who did bother to submit tests would likely 
copy ids between files rather than generating new ones each time. For 
files that generate multiple tests from data it would be even worse; 
it's not clear at all how to generate a unique, but stable, id for each 
test in that case.

The other option is to allow the possibility of name clashes, but make 
them easy to resolve. One way to do this would be to require one test 
per file and use the path to the file as the unique test identifier. It 
is possible that two people could simultaneously submit different tests 
with the same name, but if it happened the conflict would be rather easy 
to resolve. However this system has one showstopper-level disadvantage; 
the lack of multiple tests per file makes test authors dramatically less 
productive, so we lose significant coverage. That's not an acceptable 
tradeoff.

So finally we come to the solution where we allow multiple tests per 
file, and give each test a human-readable name. This only requires local 
coordination (we need to ensure that each file name is unique and each 
test name is unique, but don't need to compare to any global state), 
doesn't require using human-unfriendly identifiers like uuids, and 
allows test authors to be productive by supporting many tests in a file. 
It also has some more side benefits; by requiring that each test has a 
unique title we get some metadata about what is being tested for "free". 
This dramatically reduces the need to have a separate description of the 
test intent. Clearly this solution is a tradeoff, but it's one that 
works well.

> Would it be fair to say that supporting unique test IDs wasn't a design
> requirement when the harness/runner framework was put together and now
> we are where we are it's easier to use the suggested approach than to
> assign unique test IDs and have to retrofit them to thousands of test
> cases?

No, having unique test ids is absolutely a requirement. As I said 
before, we run all the tests and need to keep track of what results we 
got for each test on previous runs, such that we know if any changed 
unexpectedly or not. This depends on knowing which result corresponds to 
each test in a way that is stable across runs. It's just that there are 
other requirements that shape the form those ids can take.

I have also previously worked on a system that could store these test 
results in a database, so I know that's possible too.

> One other thing: it wasn't clear to me how your proposal would work is a
> test name is changed?

A test name being changed is like deleting the old test and adding a new 
one. But there just aren't many cases where people come in and change a 
whole load of test names without also making substantive changes to the 
tests, so I don't think this is a huge problem.
Received on Thursday, 1 May 2014 22:21:02 UTC