Re: WebTV Help for Getting Engaged in W3C Test Effort from Andy Hickman on 2014-04-29 (public-test-infra@w3.org from April to June 2014)

From: Andy Hickman <andy.hickman@digitaltv-labs.com>
Date: Wed, 30 Apr 2014 00:08:02 +0100
To: Robin Berjon <robin@w3.org>, Giuseppe Pascale <giuseppep@opera.com>
CC: "public-test-infra@w3.org" <public-test-infra@w3.org>, "public-web-and-tv@w3.org" <public-web-and-tv@w3.org>
Message-ID: <536030D2.5070107@digitaltv-labs.com>
Hi Robin, Giuseppe, all,

I'd like to add some thoughts on the use and role of metadata in the 
test cases. I work for a testing company focussing on the TV domain and 
I spoke recently at the Web & TV Convergence workshop [1].

Other standards organisations, pay-TV platform operators or regulators - 
referred to as "organisations" below - wish to build 
test/certification/logo regimes that reference the W3C tests, 
particularly for HTML5. My company is heavily involved in HbbTV, and we 
saw at the Web & TV Convergence event that there are several other 
bodies with a similar interest (DLNA, Smart TV Alliance, IPTV Forum 
Japan, ...).

Such organisation are likely to have at least some of the following test 
metadata requirements. I've tried to order these in (my view of) 
decreasing priority. I expect that in many cases they can be achieved 
today using what has already been done by the W3C test community. I'm a 
newbie here so, (a) apologies in advanced for any misinterpretations on 
my part, (b) any suggestions for how to achieve these goals would be 
most welcome.

1) Ability to uniquely and persistently reference every test case, 
regardless of files being renamed and moved within the Git repository. 
Typically this is done in test suites by defining unique test IDs. This 
is critical to enabling practical use of the test cases in an 
operational test regime; e.g. allowing test cases to be robustly 
challenged and waived as needed. The xUnit type approach of identifying 
tests via meaningful filenames / class names / method names - perfectly 
sensible in many contexts - simply doesn't cut it for formal test 
regimes where several versions of the test suite are shared between many 
different companies. We don't expect that W3C will define these 
operational practices but they would ideally put in place the bare 
minimum (e.g. robust test IDs) that allow other organisations to define 
processes that don't fall down over confusion about precisely which 
tests are being referred to as the test suite changes over time.

2) Ability to define a precise subset of W3C tests, covering areas of 
particular interest to that organisation and that can be reasonably 
expected to be passed 100% on all compliant devices. In practice this 
probably involves selecting only tests that pass on a majority of 
desktop browsers. See [1] and [2] for more background on why this is 
needed. One obvious way to define a subset is for the organisation to 
maintain their own list/manifest of test IDs; another is to allow the 
organisation to redistribute a subset of W3C tests (I'm not sufficiently 
familiar with the W3C test license terms to know whether this is possible).

3) Reference a particular version of each test so that all device 
manufacturers are running exactly the same tests. Again, you could 
imagine this being achieved in a number of different ways: the 
organisation re-distributing W3C's tests; the organisation includes test 
ID + version information in a test suite manifest and relies on 
manufacturers to source the correct test material from Git; the 
organisation includes test IDs + a Git tag to ensure the manufacturer 
obtains the correct test material; etc.

4) A description of the basic test purpose; i.e. an assertion on what 
correct user agent behaviour shall be under a set of conditions. E.g. 
"when A, B, C .... happen the device shall do X". Often this test 
assertion text reads somewhat similarly to the specification itself, but 
as any test analyst will tell you there are usually subtle and important 
differences. My understanding is that today the test assertion is 
effectively encapsulated in the source code of the test - presumably in 
the test name string parameter that is passed to the test() function. Is 
that a reasonable assumption or does it really depend upon the 
individual style of the test author?

5) Metadata about which specification clause is being tested without 
having to inspect source code. James and Robin have commented 
extensively on the practical and operational overhead of doing this and 
the fact that it brings marginal benefit to many in the community. I 
accept nearly all those arguments but still feel there is a legitimate 
and important use case here. The situation today is that in the TV 
embedded domain it's almost inevitable that a subset of tests will need 
to be defined by 3rd party organisations. It simply won't be viable or 
useful to require all the W3C tests to be run so it's an interesting 
mental exercise to consider how a third party organisation could go 
about defining such a subset of test cases. The cost of manually 
reviewing source code of every test (and distinguishing between the test 
fixture HTML/JavaScript and the main test purpose) in order to identify 
which parts of the spec is being tested for tens of thousands of tests 
is absolutely huge. If this is only a problem to these third party 
organisations then I guess they must bear the cost and, as others have 
suggested, maybe the solution is to provide a common method to capture 
this metadata so that at least organisations can benefit from each 
other's work rather than independently repeating the same exercise.

[1] 
http://www.w3.org/2013/10/tv-workshop/papers/webtv4_submission_10.pdf, 
"HbbTV Testing – an approach to testing TV receiver middleware
based on web standards"
[2] 
http://www.w3.org/2013/10/tv-workshop/papers/webtv4_submission_1.pdf, 
"The HbbTV Certification Process"


Thanks,
Andy


CTO, Digital TV Labs.

On 28/04/2014 16:42, Robin Berjon wrote:
> On 28/04/2014 15:49 , Giuseppe Pascale wrote:
>> Note that this questions are not intended to request to add work or
>> process to your group (which I doubt I could do anyhow) but to clarify
>> some of the questions which have been asked in the last workshop and to
>> set the right expectations of what people may and may not find in a W3C
>> test suite.
>
> Sure thing.
>
>>     It's a very simple process. When you first create a test, you
>>     *might* get the metadata right. (Even then it's a big, big "might"
>>     because most people will copy from an existing file, and through
>>     that get wrong metadata.)
>>
>> I agree that an author may get things wrong, but the reviewer should be
>> responsible for checking the spec reference.
>
> The problem there is that this adds more work for the reviewer, and we 
> already have a problem with insufficient reviewer bandwidth. Given our 
> constraints it is unlikely that any change involving more work for the 
> reviewers would be popular.
>
>> Otherwise I'm not clear
>> what a reviewed test actually mean.  Isn't the reviewer supposed to
>> check if the test matches some spec text? If so, and if the author
>> doesn't write which spec version he is testing, can the reviewer really
>> know what he he supposed to check?
>
> I am not aware of a review in which the specification version has been 
> taken into account. To date I only know of reviews done against the 
> latest and greatest specification. The idea is that master should 
> always be testing the latest, that's where the value is. If someone 
> needs the test suite to match a specific version we have branches for 
> that purpose (but it is up to whoever has that need to produce the 
> specific subset).
>
> Also, in reality only a relatively limited subset of tests cleanly map 
> to a single section. It is usually the case that tests need to 
> exercise more than one part of a specification at once. That would 
> mean more links, etc.
>
> In general if the reviewer can't figure out which part(s) of the spec 
> you are testing simply by reading the code and having the spec open, 
> there's a problem either with the test or with the spec. (Or it's a 
> really obscure feature, but we shouldn't optimise for those.)
>
>> Maybe the response is: always check the latest editor draft at hand. If
>> so, as discussed before maybe the spec version can be auto-inferred by
>> the commit date.
>
> But for many specs at best that will give you a commit for the 
> editor's draft, not an addressable snapshot. You can't use such 
> heuristics to address specific snapshot specifications.
>
> Where specific snapshots need to be tested, presumably whoever is in 
> charge of the snapshot knows what they are subsetting from the ED and 
> can produce a subset snapshot test suite with a relatively quick 
> application of grep and git rm in the appropriate branch. Or by 
> listing the results to ignore in the implementation report.
>
> (FWIW we do have that use case for HTML and to a lesser degree DOM, 
> and so far it is working fine for us.)
>
>>     But when it's updated what's your incentive to update the metadata?
>>     What points you to remember to update it? Pretty much nothing. If
>>     it's wrong, what will cause you to notice? Absolutely nothing since
>>     it has no effect on the test.
>>
>> Once again I would expect a "reviewer" to be in charge of it in a
>> structured review process (and I would expect an updated test to be
>> subject to review). And I assume the reviewer to actually check a spec
>> to see if a test is valid (or otherwise how does it check its validity)?
>
> Again, that seems to place a lot of extra work on the shoulders of 
> reviewers when we already don't have enough of those. And it doesn't 
> tell us what incentive the reviewers have to check the metadata when 
> those we currently have participating aren't interested in having any.
>
> We could have metadata-specific reviewers, but that is more overhead 
> that would hurt everyone. That's why I suggest doing it orthogonally. 
> It would be simpler for everyone.
>
>> Maybe, also here, the answer is implicit (check the latest ED) and can
>> then be autogenerated knowing the commit date
>
> But I don't see the use case. I know that there is a genuine demand in 
> some communities for having proper snapshot specifications, and it 
> logically comes out that they might need snapshot test suites as well. 
> But I've never heard any of those ask for pointers to specific ED 
> commits, which is all you'd get with the above. I don't think it helps 
> that use case.
>
>>     So far, in the pool of existing contributors and reviewers, we have
>>     people who benefit greatly from a working test suite, but to my
>>     knowledge no one who would benefit from up to date metadata. Without
>>     that, I see no reason that it would happen.
>>
>> The reason for raising this issue is because during the workshop we had
>> some people asked about this, i.e. how can they know which tests to use
>> given a set of specs they reference. E.g. how can I know which tests are
>> up to date and which one have been written against an old spec (and
>> maybe not valid anymore?)
>
> The platform has a pretty strong commitment to backwards 
> compatibility. The issue of tests only applying to an old 
> specification is therefore less likely to start with.
>
> That said, the process is simple: continuous maintenance. Tests are 
> run continuously and are investigated when they fail. In fact, that's 
> the only way I can think of to make this work. The fact that a test 
> was written against an older specification tells you absolutely 
> nothing about its validity — the odds are it is still valid (in fact, 
> the older the spec, the more likely).
>
>>     I believe everything is in place for the system described above to
>>     be implemented relatively easily. I am fully confident that if there
>>     is a community that genuinely requires testing metadata they could
>>     bash together such a tool in under a month. And we're happy to help
>>     answer questions and provide hooks (e.g. GitHub update hooks) where
>>     needed.
>>
>> sounds like a sensible approach. Maybe that will also help inform this
>> discussion, i.e. to identify if there are some basic metadata which are
>> needed, missing and that an external group cannot generate.
>
> Sure, we're happy to work with anyone to enable third-parties to reuse 
> the WPT as much as possible.
>
>>     This is a volunteer and so far largely unfunded project. It is also
>>     by a wide margin the best thing available for Web testing today. Its
>>     shape and functionality matches what current contributors are
>>     interested in; if there are new interests not so far catered to, the
>>     solution is simple: just bring in new contributors interested in 
>> this!
>>
>> The goal of this conversation is to bring new contributors, as
>> (understandably) some people didn't want to commit to something which
>> looked like a black box (to them).
>
> It's not at all a black box, everything is done in the open!
>
Received on Wednesday, 30 April 2014 09:38:22 UTC