- From: James Graham <james@hoppipolla.co.uk>
- Date: Thu, 14 Nov 2013 11:17:25 +0700
- To: <public-test-infra@w3.org>
One of the biggest problems that we currently face with the test repository is the large number of tests that are currently stuck in code review. This causes a number of significant problems; most obviously such tests aren't being run and so aren't helping to improve the web. Having a long review queue is also a significant turn off to potential or existing contributors who may feel like there is no point in contributing if their tests are going to sit in review for months at a time. By the time that tests are reviewed such contributors have often moved on to work on other things, so they have limited or no time to deal with followup issues. The fundamental reasons that code review seems to be slow are: * It is hard work * It is not super-rewarding * No one is being paid to do it It's not clear that any of these are solvable problems, so to fix the problem we are going to have to change something about the way that we do reviews rather than merely hoping that they will happen sooner in the future. In looking for a solution, we should keep in mind several goals of reviews: * Check that tests are correct per spec * Check that tests are written in a robust way * Check that tests are using best practices (e.g. using testharness.js in the right way, conforming to style conventions, etc.) * Ensure that tests are not using server-side script in a malicious way The problem then is how to avoid front-loading review burden whilst sacrificing as few of the above goals as possible. One important observation is that once people are actually running tests they have motivation to investigate tests that fail in their implementation and fix them if they are incorrect. So over long enough time periods incorrect fails should sort themselves out (although there might be incorrect passes that get missed, and so on). One approach that was identified at TPAC was to allow test review to be forwarded from other public review systems. So, for example, if someone at Mozilla got r+ on a patch containing web-platform-tests changes, this would be allowed to automatically carry over to the web-platform-tests review by merging the PR and adding a comment to the original review, so such patches would land immediately. This, then, would effectively split the review burden into two parts; patches coming from implementers which would typically need no further review, and patches coming from individual contributors which would typically still require upfront review. Compared to the above list of goals for review, allowing implementer review to carry forward directly would slightly reduce the effectiveness at ensuring tests are correct per spec; such tests would presumably be more likely to match the specific behaviour of one browser even if it is based on a misunderstanding of the spec requirements. Assuming the implementer does competent review it should not significantly affect robustness since implementers running the tests have a significant motivation to make them robust. It would significantly reduce the chance that the tests had been reviewed correctly for best practices since the original reviewer might have expertise in the browser code rather than in the web-platform-tests conventions. Again, assuming that the implementer is running the tests, it should be possible to avoid malicious code entering the system this way since the vendor would be exposing themselves to any security problems. Given that there is a history of vendors submitting tests that do in fact match their implementations but do not in fact match the spec, and tests that have significant style issues, it does seem that some mechanism is needed to allow external review of these tests. Two possibilities have been suggested so far; that the tests only get approved after some timeout (i.e. review can be carried forward, but we wait for N days before actually merging to give a chance for people to review), or that we keep an out-of-band record of which PRs were only reviewed within an organization and allow people to do explicit post-hoc review. I don't know which of these options is better; or if there are further options. I think the second has some advantages for integrating with UA import systems since it keeps the code more in-sync between the UA repo and the external repo. Both also require some additional tooling. Comments?
Received on Thursday, 14 November 2013 04:17:47 UTC