Re: Review of tests upstreamed by implementors from Tobie Langel on 2013-03-20 (public-test-infra@w3.org from January to March 2013)

From: Tobie Langel <tobie@w3.org>
Date: Wed, 20 Mar 2013 22:24:10 +0100
To: James Graham <jgraham@opera.com>
Cc: Robin Berjon <robin@w3.org>, public-test-infra <public-test-infra@w3.org>
Message-ID: <78385FCEAC2D448B99BC28CAE519B583@w3.org>
On Wednesday, March 20, 2013 at 9:46 PM, James Graham wrote:
> I think that if we believe organisation-internal review to be sufficient
> we should be able to back this up e.g. with evidence that existing 
> submissions where it is claimed that some internal review happened did not 
> suffer defects at the same rate as submissions where no such claim is 
> made. My feeling (not data) is that this isn't always the case; at least I 
> recall significant submissions, which I presume underwent internal review, 
> that nevertheless contained some major defects, often arising from 
> a misunderstanding of the specification that went unchecked in a closed 
> environment. I would particularly worry about cases where the tests were 
> reviewed as part of a patch to implement the feature; in such a case it 
> would be extremely easy for the reviewer to concentrate on the 
> implementation and blindly approve the tests on the basis that they pass 
> when run with the accompanying code.

These are valid concerns. But we have to balance them with the cost of having a massive backlog of unreviewed tests. Generally I feel we need strategies to mitigate these concerns, not heavy process. One way I'm thinking of thinking of addressing this would be to automatically run test submissions and report the results to the reviewer.
> Of course this isn't to say that encouraging releasing records of internal 
> review along with the test shouldn't be encouraged. However they should be 
> an aid to other users of the testsuite looking to do review, not a 
> replacement. Which brings me to my main point; the tests going unreviewed 
> is a symptom of a much more serious underlying problem, namely that 
> implemntors are not actually running the tests from the W3C repository.

Agreed. I've started working on this, planning to spend more time on this effort once budgeting infrastructure is completed. Your help to make this happen in WebKit/Chromium would be much appreciated.
> If people were actually running the tests they would be inclined to provide
> review when implementing a feature with submitted tests, both to get the 
> submission accepted and to discover the limitations of the existing 
> testsuite.

Absolutely. 
> They would also be highly likely to pick up errors in the 
> testsuite, which often manifest themselves as differences between 
> implementations (that is, a wrong test will correctly fail in a second 
> implementation).

Yes, though running these tests as part of the submission should help catch those (if there are more than one implementation). 
> This would solve all the problems with review slowness 
> without damaging review quality, without eroding all the benefits of 
> having multiple, diverse, contributers understand the structure and 
> coverage of a particular testsuite,

Agreed. 
> and without leading to a scenario 
> where a single organisation can unilaterally declare the testsuite for a 
> feature to be "complete" (at least for Process purposes).

I know there's a past history of sketchy PR games around testing. Nevertheless, I would like to start afresh and trust everyone to behave properly. If people don't behave, calling them for it seems like a good first step, and if this isn't sufficient there are other places in which to introduce process which won't slow don't test reviews.
> I should note 
> that in webapps at least this latter situation is more or less arising 
> today, so it isn't theoretical.

That sucks. We should try to solve this upfront.
> It is my opinion that we should be putting considerably more effort into 
> working with all implementors to ensure that they are actually running the 
> tests that we are collecting.

Amen. 
> Once we do that we won't have to put so much 
> effort into fiddling with the details of procedures in order to get 
> testing momentum going, because it will be naturally beneficial for all 
> the parties involved to provide the momentum out of their own best 
> interests.

Can't agree more. What were you challenging already? :P

--tobie
Received on Wednesday, 20 March 2013 21:24:19 UTC