RE: WebTV Help for Getting Engaged in W3C Test Effort

Hi Robin, Andy, Giuseppe,

Andy - that is a helpful overview of the issues, thanks for taking the time to document it. It would be great if we can find some way to discuss this together in a single event, either a conf call or face to face as there are a number of interested parties here and we need to try and come up with a single consistent approach for all TV-like platforms if at all possible as quickly as we can.

Best regards,
Ian Medland

Chair - HbbTV Testing Group

-----Original Message-----
From: Robin Berjon [mailto:robin@w3.org] 
Sent: 30 April 2014 14:24
To: Andy Hickman; Giuseppe Pascale
Cc: public-test-infra@w3.org; public-web-and-tv@w3.org
Subject: Re: WebTV Help for Getting Engaged in W3C Test Effort

Hi Andy,

On 30/04/2014 01:08 , Andy Hickman wrote:
> Such organisation are likely to have at least some of the following 
> test metadata requirements. I've tried to order these in (my view of) 
> decreasing priority. I expect that in many cases they can be achieved 
> today using what has already been done by the W3C test community. I'm 
> a newbie here so, (a) apologies in advanced for any misinterpretations 
> on my part, (b) any suggestions for how to achieve these goals would 
> be most welcome.

No need to apologise. The Web platform is powerful but unwieldy and this is getting to be a big project, so we can't expect people to quickly know everything. Also, as the Web grows it is only to be expected that it will change and adapt to new places; so thank you very much for taking the time to document where you are coming from. Communication is particularly essential if the TV community is to succeed in producing anything other than WAP 3.0 — there are 25 years of mistakes to learn from :)

> 1) Ability to uniquely and persistently reference every test case, 
> regardless of files being renamed and moved within the Git repository.

Here there is an important distinction that needs to be made and that will likely condition all of the rest: what do you mean by "test case"?

I ask because the set up we have has both test files and test cases where one of the former may contain between one and several thousand of the latter. We tend to use test files as more or less "human-oriented" 
containers that roughly map to one logical feature but there is no clear delineation because there is no clear delineation of what a feature is (e.g. is cloning a node a feature or are each of cloning an element node, text node, comment node, etc. features).

When we look at results, typically the unit of interest is the individual test case much more than test files.

The distinction matters because it would be relatively easy (or, at least, feasible within possibly acceptable heuristics) to implement a unique ID system on top of files that reside in git (since git tracks renames and such). It would be more difficult to obtain unique IDs for test cases.

Assigning them by hand (and then maintaining them) would be a large undertaking that I doubt would be very popular. It would also be potentially problematic for all the automatically generated tests (be they things like generated tests for IDL which are used across the board, or much more specific things like the tests generated for Range which are ad hoc to that interface but still count in the thousands). 
I'm not saying that automatically generating IDs is in itself hard, however ensuring you get the same ones for the same test cases over time is.

I *can* however think of ways in which the IDs could be maintained automatically in a third-party system. IIRC testharness expressed unhappiness when two test cases inside a given file have the same test name. This means that at a given commit, the {file name, test name} tuple is unique: an ID can be assigned to it. A database tracking the repository can then:

   1) Track git moves (as well as removals and additions) in order to maintain the identifier when the file part changes.
   2) Track addition and removal of test names per file with on-commit runs. (There is infrastructure that should make it possible to extract all the test names easily, including generated ones — we can look at the details if you decide to go down that path.)

This could be the foundation for the external metadata system which I've described.

When additions and removals are detected, ideally you'd have a human check them over to ensure that nothing is amiss.

> 2) Ability to define a precise subset of W3C tests, covering areas of 
> particular interest to that organisation and that can be reasonably 
> expected to be passed 100% on all compliant devices. In practice this 
> probably involves selecting only tests that pass on a majority of 
> desktop browsers. See [1] and [2] for more background on why this is 
> needed. One obvious way to define a subset is for the organisation to 
> maintain their own list/manifest of test IDs; another is to allow the 
> organisation to redistribute a subset of W3C tests (I'm not 
> sufficiently familiar with the W3C test license terms to know whether this is possible).

We generate a manifest of all test files; it should not be hard to subset it.. In fact our test runner uses it to support crude (but useful) subsetting of the test suite already so that we can run just some parts.

Note that again this is based on test files and not test cases. It is however entirely possible to filter out results based on test names. (I have been considering supporting that in my report-generation tool so that we can have test reports for subset snapshot specifications without having to produce a subset branch of the test suite.)

Concerning licensing of W3C tests, my understanding is this: if you want to claim conformance to a *W3C* specification, then you have to use the unadulterated test suite. If it's for whatever else, then you're under regular 3-clause BSD.

(I think that's still too restrictive, but that's a problem for another
discussion.)

> 3) Reference a particular version of each test so that all device 
> manufacturers are running exactly the same tests. Again, you could 
> imagine this being achieved in a number of different ways: the 
> organisation re-distributing W3C's tests; the organisation includes 
> test ID + version information in a test suite manifest and relies on 
> manufacturers to source the correct test material from Git; the 
> organisation includes test IDs + a Git tag to ensure the manufacturer 
> obtains the correct test material; etc.

That's easy enough to implement, as you say there are many options. The simplest is a git tag (we're happy to tag the repo; you can even issue a pull request for it).

> 4) A description of the basic test purpose; i.e. an assertion on what 
> correct user agent behaviour shall be under a set of conditions. E.g.
> "when A, B, C .... happen the device shall do X". Often this test 
> assertion text reads somewhat similarly to the specification itself, 
> but as any test analyst will tell you there are usually subtle and 
> important differences. My understanding is that today the test 
> assertion is effectively encapsulated in the source code of the test - 
> presumably in the test name string parameter that is passed to the 
> test() function. Is that a reasonable assumption or does it really 
> depend upon the individual style of the test author?

You can expect a description of what a test case does in its test string; whether that is sufficient for your purposes or not I do not know.

> 5) Metadata about which specification clause is being tested without 
> having to inspect source code. James and Robin have commented 
> extensively on the practical and operational overhead of doing this 
> and the fact that it brings marginal benefit to many in the community. 
> I accept nearly all those arguments but still feel there is a 
> legitimate and important use case here. The situation today is that in 
> the TV embedded domain it's almost inevitable that a subset of tests 
> will need to be defined by 3rd party organisations. It simply won't be 
> viable or useful to require all the W3C tests to be run so it's an 
> interesting mental exercise to consider how a third party organisation 
> could go about defining such a subset of test cases. The cost of 
> manually reviewing source code of every test (and distinguishing 
> between the test fixture HTML/JavaScript and the main test purpose) in 
> order to identify which parts of the spec is being tested for tens of 
> thousands of tests is absolutely huge. If this is only a problem to 
> these third party organisations then I guess they must bear the cost 
> and, as others have suggested, maybe the solution is to provide a 
> common method to capture this metadata so that at least organisations 
> can benefit from each other's work rather than independently repeating the same exercise.

As I indicated previously, this could indeed be managed as an independent database. That said, a lot of the question boils down to how precise you need the reference to be.

If it's meant to be to an individual testable assertion (or subset
thereof) then you're out of luck. However if it's to a subsection in a specification you're in better shape. The hierarchy of directories normally reflects that of the specification, with directory names reusing section IDs. That provides a pointer back to the source.

(In truth that usage is not consistent across the board at this point, but with some help it could be.)

--
Robin Berjon - http://berjon.com/ - @robinberjon



Ian Medland | Head of Technical Development, DTG Testing Ltd | DTG | www.dtg..org.uk 
5th Floor, 89 Albert Embankment | Vauxhall | London | SE1 7TP
Tel: +44 (0)20 7840 6580 | Fax: +44 (0)20 7840 6599


Please consider the environment before printing this email

Received on Thursday, 1 May 2014 14:37:53 UTC