Re: Identifying unstable tests from James Graham on 2013-03-22 (public-test-infra@w3.org from January to March 2013)

From: James Graham <jgraham@opera.com>
Date: Fri, 22 Mar 2013 09:49:54 +0100
To: Dirk Pranke <dpranke@chromium.org>
CC: Tobie Langel <tobie@w3.org>, Robin Berjon <robin@w3.org>, public-test-infra <public-test-infra@w3.org>
Message-ID: <514C1B32.9000200@opera.com>

On 03/21/2013 11:38 PM, Dirk Pranke wrote:
>
> On Thu, Mar 21, 2013 at 8:31 AM, James Graham <jgraham@opera.com

>     So, I'm not exactly clear what you're proposing, but experience
>     suggests that the best way to identify flaky tests is upfront, by
>     running the test multiple (hundreds) of times before it is used as
>     part of a test run. Trying to use historical result data to identify
>     flaky tests sounds appealing, but it is much more complex since both
>     the test and the UA may change between runs. That doesn't mean it's
>     impossible, but I strongly recommend implementing the simple
>     approach first.
>
>
> FWIW, WebKit has invested a fair amount of time in tracking test
> flakiness over time. The initial up-front beating probably identifies
> many problems, but we often find cases where tests can be flaky on
> different machine configurations (e.g., runs fine on a big workstation,
> but not on a VM) or can be flaky when the tests accidentally introduce
> side effects into the environment or the test executables.

Yes, agreed. A quarantine system isn't a panacea by any stretch of the 
imagination. But in terms of bang-for-buck it is the best approach I 
have found for avoiding needless problems. Of course it can't identify 
everything, but I don't know of any other technique that can either.

> It's a significant source of pain, unfortunately.

Tell me about it :(

Received on Friday, 22 March 2013 08:50:25 UTC