Identifying unstable tests (was: Review of tests upstreamed by implementors)

On Thursday, March 21, 2013 at 2:56 PM, Robin Berjon wrote:
> On 21/03/2013 14:11 , James Graham wrote:
> > One approach Opera have used with success is to implement a "quarantine"
> > system in which each test is run a large number of times, say 200, and
> > tests that don't give consistent results sent back to a human for
> > further analysis. Any W3C test-running tool should have this ability so
> > that we can discover (some of) the tests that have problematically
> > random behaviour before they are merged into the testsuite. In addition
> > we should make a list of points for authors and reviewers to use so that
> > they avoid or reject known-unstable patterns (see e.g. [1]).
>  
> That doesn't sound too hard to do. At regular intervals, we could:
>  
> • List all pull requests through the GH API.
> • For each of those:
> • Check out a fresh copy of the repo
> • Apply the pull request locally
> • Run all tests (ideally using something that has multiple browsers,  
> but since we're looking for breakage even just PhantomJS or something  
> like it would already weed out trouble).
> • Report issues.
>  
> It's a bit of work, but it's doable.
Yes, such a system is planned and budgeted.

I hadn't thought about using it to find unstable tests, but that should be easy enough to setup. A cron job could go through the results, identify flaky tests and file bugs.

The more complex question is what should be done with those test from the time they are identified as problematic until they're fixed. And how should this information be conveyed downstream.

--tobie

Received on Thursday, 21 March 2013 14:43:21 UTC