Code review improvements from James Graham on 2013-11-14 (public-test-infra@w3.org from October to December 2013)

From: James Graham <james@hoppipolla.co.uk>
Date: Thu, 14 Nov 2013 11:17:25 +0700
To: <public-test-infra@w3.org>
Message-ID: <88900f04c32b622c7ffd9e9fa06f425b@webmail.webfaction.com>
One of the biggest problems that we currently face with the test 
repository is the large number of tests that are currently stuck in code 
review. This causes a number of significant problems; most obviously 
such tests aren't being run and so aren't helping to improve the web. 
Having a long review queue is also a significant turn off to potential 
or existing contributors who may feel like there is no point in 
contributing if their tests are going to sit in review for months at a 
time. By the time that tests are reviewed such contributors have often 
moved on to work on other things, so they have limited or no time to 
deal with followup issues.

The fundamental reasons that code review seems to be slow are:

* It is hard work

* It is not super-rewarding

* No one is being paid to do it

It's not clear that any of these are solvable problems, so to fix the 
problem we are going to have to change something about the way that we 
do reviews rather than merely hoping that they will happen sooner in the 
future. In looking for a solution, we should keep in mind several goals 
of reviews:

* Check that tests are correct per spec

* Check that tests are written in a robust way

* Check that tests are using best practices (e.g. using testharness.js 
in the right way, conforming to style conventions, etc.)

* Ensure that tests are not using server-side script in a malicious way

The problem then is how to avoid front-loading review burden whilst 
sacrificing as few of the above goals as possible. One important 
observation is that once people are actually running tests they have 
motivation to investigate tests that fail in their implementation and 
fix them if they are incorrect. So over long enough time periods 
incorrect fails should sort themselves out (although there might be 
incorrect passes that get missed, and so on).

One approach that was identified at TPAC was to allow test review to be 
forwarded from other public review systems. So, for example, if someone 
at Mozilla got r+ on a patch containing web-platform-tests changes, this 
would be allowed to automatically carry over to the web-platform-tests 
review by merging the PR and adding a comment to the original review, so 
such patches would land immediately. This, then, would effectively split 
the review burden into two parts; patches coming from implementers which 
would typically need no further review, and patches coming from 
individual contributors which would typically still require upfront 
review. Compared to the above list of goals for review, allowing 
implementer review to carry forward directly would slightly reduce the 
effectiveness at ensuring tests are correct per spec; such tests would 
presumably be more likely to match the specific behaviour of one browser 
even if it is based on a misunderstanding of the spec requirements. 
Assuming the implementer does competent review it should not 
significantly affect robustness since implementers running the tests 
have a significant motivation to make them robust. It would 
significantly reduce the chance that the tests had been reviewed 
correctly for best practices since the original reviewer might have 
expertise in the browser code rather than in the web-platform-tests 
conventions. Again, assuming that the implementer is running the tests, 
it should be possible to avoid malicious code entering the system this 
way since the vendor would be exposing themselves to any security 
problems.

Given that there is a history of vendors submitting tests that do in 
fact match their implementations but do not in fact match the spec, and 
tests that have significant style issues, it does seem that some 
mechanism is needed to allow external review of these tests. Two 
possibilities have been suggested so far; that the tests only get 
approved after some timeout (i.e. review can be carried forward, but we 
wait for N days before actually merging to give a chance for people to 
review), or that we keep an out-of-band record of which PRs were only 
reviewed within an organization and allow people to do explicit post-hoc 
review. I don't know which of these options is better; or if there are 
further options. I think the second has some advantages for integrating 
with UA import systems since it keeps the code more in-sync between the 
UA repo and the external repo. Both also require some additional 
tooling.

Comments?
Received on Thursday, 14 November 2013 04:17:47 UTC