Re: Code review improvements from Arthur Barstow on 2013-11-21 (public-test-infra@w3.org from October to December 2013)

From: Arthur Barstow <art.barstow@nokia.com>
Date: Thu, 21 Nov 2013 09:32:41 -0500
To: James Graham <james@hoppipolla.co.uk>, public-test-infra <public-test-infra@w3.org>
Message-ID: <528E1989.6040202@nokia.com>
On 11/13/13 11:17 PM, ext James Graham wrote:
> One of the biggest problems that we currently face with the test 
> repository is the large number of tests that are currently stuck in 
> code review. This causes a number of significant problems; most 
> obviously such tests aren't being run and so aren't helping to improve 
> the web. Having a long review queue is also a significant turn off to 
> potential or existing contributors who may feel like there is no point 
> in contributing if their tests are going to sit in review for months 
> at a time. By the time that tests are reviewed such contributors have 
> often moved on to work on other things, so they have limited or no 
> time to deal with followup issues.
>
> The fundamental reasons that code review seems to be slow are:
>
> * It is hard work
>
> * It is not super-rewarding
>
> * No one is being paid to do it
>
> It's not clear that any of these are solvable problems, so to fix the 
> problem we are going to have to change something about the way that we 
> do reviews rather than merely hoping that they will happen sooner in 
> the future. In looking for a solution, we should keep in mind several 
> goals of reviews:
>
> * Check that tests are correct per spec
>
> * Check that tests are written in a robust way
>
> * Check that tests are using best practices (e.g. using testharness.js 
> in the right way, conforming to style conventions, etc.)
>
> * Ensure that tests are not using server-side script in a malicious way
>
> The problem then is how to avoid front-loading review burden whilst 
> sacrificing as few of the above goals as possible. One important 
> observation is that once people are actually running tests they have 
> motivation to investigate tests that fail in their implementation and 
> fix them if they are incorrect. So over long enough time periods 
> incorrect fails should sort themselves out (although there might be 
> incorrect passes that get missed, and so on).
>
> One approach that was identified at TPAC was to allow test review to 
> be forwarded from other public review systems. So, for example, if 
> someone at Mozilla got r+ on a patch containing web-platform-tests 
> changes, this would be allowed to automatically carry over to the 
> web-platform-tests review by merging the PR and adding a comment to 
> the original review, so such patches would land immediately. This, 
> then, would effectively split the review burden into two parts; 
> patches coming from implementers which would typically need no further 
> review, and patches coming from individual contributors which would 
> typically still require upfront review. Compared to the above list of 
> goals for review, allowing implementer review to carry forward 
> directly would slightly reduce the effectiveness at ensuring tests are 
> correct per spec; such tests would presumably be more likely to match 
> the specific behaviour of one browser even if it is based on a 
> misunderstanding of the spec requirements. Assuming the implementer 
> does competent review it should not significantly affect robustness 
> since implementers running the tests have a significant motivation to 
> make them robust. It would significantly reduce the chance that the 
> tests had been reviewed correctly for best practices since the 
> original reviewer might have expertise in the browser code rather than 
> in the web-platform-tests conventions. Again, assuming that the 
> implementer is running the tests, it should be possible to avoid 
> malicious code entering the system this way since the vendor would be 
> exposing themselves to any security problems.
>
> Given that there is a history of vendors submitting tests that do in 
> fact match their implementations but do not in fact match the spec, 
> and tests that have significant style issues, it does seem that some 
> mechanism is needed to allow external review of these tests. Two 
> possibilities have been suggested so far; that the tests only get 
> approved after some timeout (i.e. review can be carried forward, but 
> we wait for N days before actually merging to give a chance for people 
> to review), or that we keep an out-of-band record of which PRs were 
> only reviewed within an organization and allow people to do explicit 
> post-hoc review. I don't know which of these options is better; or if 
> there are further options. I think the second has some advantages for 
> integrating with UA import systems since it keeps the code more 
> in-sync between the UA repo and the external repo. Both also require 
> some additional tooling.
>
> Comments?

Thanks very much for writing this James!

Do you see these two options as mutually exclusive? I ask because it 
seems like both could be "acceptable", although neither is really ideal.

I agree with you and Dom the second option would be preferable but in 
the event we can't get that level of review, perhaps the timeout option 
could be used (as a last resort), thus giving a review preference 
hierarchy from highest to lowest of:

* External Review - review is done by an external people

* Internal Review - review is done within an org but not by the test creator

* Review by Timeout

If Review by Timeout, is considered acceptable, I suppose there could be 
inclination for groups to do no review at all but OTOH, hopefully folks 
consider test review sufficiently useful that the Timeout option would 
be rarely used.

-AB
Received on Thursday, 21 November 2013 14:34:27 UTC