Re: WebTV Help for Getting Engaged in W3C Test Effort from Robin Berjon on 2014-04-30 (public-web-and-tv@w3.org from April 2014)

From: Robin Berjon <robin@w3.org>
Date: Wed, 30 Apr 2014 15:24:16 +0200
To: Andy Hickman <andy.hickman@digitaltv-labs.com>, Giuseppe Pascale <giuseppep@opera.com>
CC: "public-test-infra@w3.org" <public-test-infra@w3.org>, "public-web-and-tv@w3.org" <public-web-and-tv@w3.org>
Message-ID: <5360F980.8010609@w3.org>
Hi Andy,

On 30/04/2014 01:08 , Andy Hickman wrote:
> Such organisation are likely to have at least some of the following test
> metadata requirements. I've tried to order these in (my view of)
> decreasing priority. I expect that in many cases they can be achieved
> today using what has already been done by the W3C test community. I'm a
> newbie here so, (a) apologies in advanced for any misinterpretations on
> my part, (b) any suggestions for how to achieve these goals would be
> most welcome.

No need to apologise. The Web platform is powerful but unwieldy and this 
is getting to be a big project, so we can't expect people to quickly 
know everything. Also, as the Web grows it is only to be expected that 
it will change and adapt to new places; so thank you very much for 
taking the time to document where you are coming from. Communication is 
particularly essential if the TV community is to succeed in producing 
anything other than WAP 3.0 — there are 25 years of mistakes to learn 
from :)

> 1) Ability to uniquely and persistently reference every test case,
> regardless of files being renamed and moved within the Git repository.

Here there is an important distinction that needs to be made and that 
will likely condition all of the rest: what do you mean by "test case"?

I ask because the set up we have has both test files and test cases 
where one of the former may contain between one and several thousand of 
the latter. We tend to use test files as more or less "human-oriented" 
containers that roughly map to one logical feature but there is no clear 
delineation because there is no clear delineation of what a feature is 
(e.g. is cloning a node a feature or are each of cloning an element 
node, text node, comment node, etc. features).

When we look at results, typically the unit of interest is the 
individual test case much more than test files.

The distinction matters because it would be relatively easy (or, at 
least, feasible within possibly acceptable heuristics) to implement a 
unique ID system on top of files that reside in git (since git tracks 
renames and such). It would be more difficult to obtain unique IDs for 
test cases.

Assigning them by hand (and then maintaining them) would be a large 
undertaking that I doubt would be very popular. It would also be 
potentially problematic for all the automatically generated tests (be 
they things like generated tests for IDL which are used across the 
board, or much more specific things like the tests generated for Range 
which are ad hoc to that interface but still count in the thousands). 
I'm not saying that automatically generating IDs is in itself hard, 
however ensuring you get the same ones for the same test cases over time is.

I *can* however think of ways in which the IDs could be maintained 
automatically in a third-party system. IIRC testharness expressed 
unhappiness when two test cases inside a given file have the same test 
name. This means that at a given commit, the {file name, test name} 
tuple is unique: an ID can be assigned to it. A database tracking the 
repository can then:

   1) Track git moves (as well as removals and additions) in order to 
maintain the identifier when the file part changes.
   2) Track addition and removal of test names per file with on-commit 
runs. (There is infrastructure that should make it possible to extract 
all the test names easily, including generated ones — we can look at the 
details if you decide to go down that path.)

This could be the foundation for the external metadata system which I've 
described.

When additions and removals are detected, ideally you'd have a human 
check them over to ensure that nothing is amiss.

> 2) Ability to define a precise subset of W3C tests, covering areas of
> particular interest to that organisation and that can be reasonably
> expected to be passed 100% on all compliant devices. In practice this
> probably involves selecting only tests that pass on a majority of
> desktop browsers. See [1] and [2] for more background on why this is
> needed. One obvious way to define a subset is for the organisation to
> maintain their own list/manifest of test IDs; another is to allow the
> organisation to redistribute a subset of W3C tests (I'm not sufficiently
> familiar with the W3C test license terms to know whether this is possible).

We generate a manifest of all test files; it should not be hard to 
subset it. In fact our test runner uses it to support crude (but useful) 
subsetting of the test suite already so that we can run just some parts.

Note that again this is based on test files and not test cases. It is 
however entirely possible to filter out results based on test names. (I 
have been considering supporting that in my report-generation tool so 
that we can have test reports for subset snapshot specifications without 
having to produce a subset branch of the test suite.)

Concerning licensing of W3C tests, my understanding is this: if you want 
to claim conformance to a *W3C* specification, then you have to use the 
unadulterated test suite. If it's for whatever else, then you're under 
regular 3-clause BSD.

(I think that's still too restrictive, but that's a problem for another 
discussion.)

> 3) Reference a particular version of each test so that all device
> manufacturers are running exactly the same tests. Again, you could
> imagine this being achieved in a number of different ways: the
> organisation re-distributing W3C's tests; the organisation includes test
> ID + version information in a test suite manifest and relies on
> manufacturers to source the correct test material from Git; the
> organisation includes test IDs + a Git tag to ensure the manufacturer
> obtains the correct test material; etc.

That's easy enough to implement, as you say there are many options. The 
simplest is a git tag (we're happy to tag the repo; you can even issue a 
pull request for it).

> 4) A description of the basic test purpose; i.e. an assertion on what
> correct user agent behaviour shall be under a set of conditions. E.g.
> "when A, B, C .... happen the device shall do X". Often this test
> assertion text reads somewhat similarly to the specification itself, but
> as any test analyst will tell you there are usually subtle and important
> differences. My understanding is that today the test assertion is
> effectively encapsulated in the source code of the test - presumably in
> the test name string parameter that is passed to the test() function. Is
> that a reasonable assumption or does it really depend upon the
> individual style of the test author?

You can expect a description of what a test case does in its test 
string; whether that is sufficient for your purposes or not I do not know.

> 5) Metadata about which specification clause is being tested without
> having to inspect source code. James and Robin have commented
> extensively on the practical and operational overhead of doing this and
> the fact that it brings marginal benefit to many in the community. I
> accept nearly all those arguments but still feel there is a legitimate
> and important use case here. The situation today is that in the TV
> embedded domain it's almost inevitable that a subset of tests will need
> to be defined by 3rd party organisations. It simply won't be viable or
> useful to require all the W3C tests to be run so it's an interesting
> mental exercise to consider how a third party organisation could go
> about defining such a subset of test cases. The cost of manually
> reviewing source code of every test (and distinguishing between the test
> fixture HTML/JavaScript and the main test purpose) in order to identify
> which parts of the spec is being tested for tens of thousands of tests
> is absolutely huge. If this is only a problem to these third party
> organisations then I guess they must bear the cost and, as others have
> suggested, maybe the solution is to provide a common method to capture
> this metadata so that at least organisations can benefit from each
> other's work rather than independently repeating the same exercise.

As I indicated previously, this could indeed be managed as an 
independent database. That said, a lot of the question boils down to how 
precise you need the reference to be.

If it's meant to be to an individual testable assertion (or subset 
thereof) then you're out of luck. However if it's to a subsection in a 
specification you're in better shape. The hierarchy of directories 
normally reflects that of the specification, with directory names 
reusing section IDs. That provides a pointer back to the source.

(In truth that usage is not consistent across the board at this point, 
but with some help it could be.)

-- 
Robin Berjon - http://berjon.com/ - @robinberjon
Received on Wednesday, 30 April 2014 13:24:29 UTC