Re: Test Review Tools from Tobie Langel on 2013-03-20 (public-test-infra@w3.org from January to March 2013)

From: Tobie Langel <tobie@w3.org>
Date: Wed, 20 Mar 2013 10:59:14 +0100
To: James Graham <jgraham@opera.com>
Cc: public-test-infra@w3.org
Message-ID: <28B73E7D421A4C5192360DEE1F606FBD@w3.org>
Hi James,

Thanks for your comments.

On Tuesday, March 19, 2013 at 4:44 PM, James Graham wrote:
> On Tue, 19 Mar 2013, Tobie Langel wrote:
> > This is jeopardized if the review tools are going to be specific to WGs 
> > or worse, individual reviewers. Review tools aren't like text editor. 
> > Unless there's a seamless bi-directional syncing of comments between GH 
> > and the review tool (that's *hard* and tends to defeat the purpose of 
> > using an external tool), the review tools is being forced onto the 
> > contributor (and the rest of the community that might that might want to 
> > follow along) by the reviewer. As I contributor, if I have to learn a 
> > completely different review tool whether I'm writing a test for the 
> > WebPerf WG (e.g. GH itself), the CSS WG (e.g. shepherd) or WebApps (e.g. 
> > Critic), I'm just going to give up.
> 
> I think that strongly depends on what type of contributer you are. If you 
> are a regular contributer there are significant enough benefits to 
> learning tools that make you more productive that it is well worth the 
> investment of your time to do it.

The point I'm trying to make is that while custom review tools makes the work of the reviewer easier (he chose the tools), it makes the work of the contributor harder (he has to adapt to different tools depending on what part of the platform his contributions relate to) as well as the work of anyone which needs to consult a PR for whatever reason (e.g. the chair of a WG, an implementor, a group member, etc.). 
> If you are an occasional contributer 
> that perhaps isn't the case. Happily we can deal with that by good sense 
> rather than forcing lowest-common-denominator policies on everyone. For 
> example we can use the high quality toolchain with regular contributers 
> and with irregular contributers whose contribution makes it worth while 
> and who opt-in to the better tools and who agree to do so.

Whether or not these are better tools depends on the context and the perspective.

If I quickly want to find out the reason why a particular test was rejected, having to go through a new interface (sign-up, etc.) is certainly not going to feel better.

> > While I understand the limitations of the GitHub review tools, they work 
> > fairly well for a number of projects with a lot more going on than ours. 
> > I feel like the reason this works is those projects adopt the "commit 
> > early, commit often" Git mindset, which guarantees small atomic commits 
> > and PRs focused on a single topic. Until we enforce that in the way we 
> > work with Git, it'll be difficult to assess faithfully whether the 
> > limitation of GH review tools are intrinsic or of our own making.
> 
> In the specific case of testcase development, the "lots of tiny pull
> requests that can be dealt with by one person in a few minutes" approach 
> isn't going to work. The reality is that many tests are written by 
> companies as they develop the corresponding feature. For policy reasons it 
> is, in many organisations, not acceptable to release the tests as they are 
> written, but instead they are batch-released at the end of the project. 
> This leads, inevitably, to contributions with thousands of tests in 
> hundreds of files being added as a single dump. It is unrealistic to 
> expect people to split such a contribution up into hundreds of one-file 
> pull requests.

A script that would do precisely this seams easy enough to write. Maybe we could even provide such a script.

That said, I'm hoping we can agree to lighten the process for implementor upstreaming[1] which would effectively make this point rather moot.
> > If we do end-up finding out better tooling is necessary, we have to 
> > carefully weigh the tradeoffs we're making by switching to a custom 
> > solution. At which point we'll also have to decide whether we keep the 
> > GitHub Issue system up (in which case we'll need to fully synchronize 
> > the comments on both systems) or close GitHub issues and switch the 
> > whole repository to using something different (which we'll need to have 
> > agreement on and also have to archive).
> 
> I don't understand why the GH issues system is relevant here.

PR requests automatically open an issue in GH.
> Rather than hope that we can get by with bad tooling by pushing 
> unreasonable demands out onto those submitting tests ("no pull requests 
> that can't be reviewed in a single short sitting")

I disagree that this is an unreasonable demand, especially if we solve the upstreaming issue through lightened process and provide scripts in the case where that's not possible.
> we should actively 
> embrace the ability to build on to of github to improve our experience.

I agree. And this is in the works. However, I strongly question your premise that multiple different review tools on the same repository are going to "improve our experience."  
> I don't think we are at the "can we live with substandard tools" stage; we
> clearly - both in theory and in practice - have significant enough needs 
> that we can't. The only question is how to mitigate the downsides that 
> have been identified.For example with critic, I could make it add a 
> comment to the PR indicating that there is a critic review and, once some 
> work happens on that review, place another comment indicating that the 
> review is in progress on critic. The initial comment could say something 
> like "if you would prefer to use github for this review please comment 
> here". We could further rely on people's common sense to not foist 
> external systems on new contributers making small contributions.

You'll notice that I'm not excluding that we might, as a group, come to the conclusion that relying on external tools for reviewing is actually the right thing to do.

I'm challenging the idea that we should outright reject the tool chosen by the entire community we're trying to get closer to, rather than reconsider our practices.

I'm also challenging the notion that having multiple side by side review tools within the same repository is a sound strategy going forward.

I'm simply advocating that we should:

1. embrace the Git and GitHub flow,
2. use GH tools only until we collectively agree that they're insufficient for our needs,
3. ponder whether the benefits of a custom review tools is worth the risk of alienating new contributors used to the GH way,
4. if we end up deciding we need a specific tool, agree collectively to use the same one, and modify our process and tooling accordingly.
> I also notice that you keep talking about archiving comments. I don't know 
> what the use case for this is, so I don't know why you think it's needed. 
> Knowing that v1 of some submission had a typo isn't a very interesting 
> fact, as long as it gets addressed. So the only interesting thing seems to 
> me to be that a code review has 0 outstanding issues at the point at which 
> it is merged.

Archiving comments are useful for the same decision as archiving mailing lists are: to retrospectively understand how we came to the conclusions we came to.
 
--tobie
---
[1]: http://lists.w3.org/Archives/Public/public-test-infra/2013JanMar/0063.html
Received on Wednesday, 20 March 2013 09:59:23 UTC