Re: Review of tests upstreamed by implementors from James Graham on 2013-03-20 (public-test-infra@w3.org from January to March 2013)

From: James Graham <jgraham@opera.com>
Date: Wed, 20 Mar 2013 21:46:05 +0100 (CET)
To: Robin Berjon <robin@w3.org>
cc: Tobie Langel <tobie@w3.org>, public-test-infra <public-test-infra@w3.org>
Message-ID: <alpine.DEB.2.02.1303202116490.7756@sirius>

On Wed, 20 Mar 2013, Robin Berjon wrote:

>> I proposed that we drop this requirement and explicitly state so in
>> the new review process, i.e. something along the lines of:
>> 
>> "Contributions must be reviewed by a peer. The reviewer can be a
>> colleague of the contributor as long as the proceedings are public."
>
> I'm getting a sense of violent agreement on this.

Then I should probably challenge it :p

I think that if we believe organisation-internal review to be sufficient 
we should be able to back this up e.g. with evidence that existing 
submissions where it is claimed that some internal review happened did not 
suffer defects at the same rate as submissions where no such claim is 
made. My feeling (not data) is that this isn't always the case; at least I 
recall significant submissions, which I presume underwent internal review, 
that nevertheless contained some major defects, often arising from 
a misunderstanding of the specification that went unchecked in a closed 
environment. I would particularly worry about cases where the tests were 
reviewed as part of a patch to implement the feature; in such a case it 
would be extremely easy for the reviewer to concentrate on the 
implementation and blindly approve the tests on the basis that they pass 
when run with the accompanying code.

Of course this isn't to say that encouraging releasing records of internal 
review along with the test shouldn't be encouraged. However they should be 
an aid to other users of the testsuite looking to do review, not a 
replacement. Which brings me to my main point; the tests going unreviewed 
is a symptom of a much more serious underlying problem, namely that 
implemntors are not actually running the tests from the W3C repository. If 
people were actually running the tests they would be inclined to provide 
review when implementing a feature with submitted tests, both to get the 
submission accepted and to discover the limitations of the existing 
testsuite. They would also be highly likely to pick up errors in the 
testsuite, which often manifest themselves as differences between 
implementations (that is, a wrong test will correctly fail in a second 
implementation). This would solve all the problems with review slowness 
without damaging review quality, without eroding all the benefits of 
having multiple, diverse, contributers understand the structure and 
coverage of a particular testsuite, and without leading to a scenario 
where a single organisation can unilaterally declare the testsuite for a 
feature to be "complete" (at least for Process purposes). I should note 
that in webapps at least this latter situation is more or less arising 
today, so it isn't theoretical.

It is my opinion that we should be putting considerably more effort into 
working with all implementors to ensure that they are actually running the 
tests that we are collecting. Once we do that we won't have to put so much 
effort into fiddling with the details of procedures in order to get 
testing momentum going, because it will be naturally beneficial for all 
the parties involved to provide the momentum out of their own best 
interests.

Received on Wednesday, 20 March 2013 20:46:35 UTC