- From: James Graham <jgraham@opera.com>
- Date: Fri, 22 Mar 2013 09:49:54 +0100
- To: Dirk Pranke <dpranke@chromium.org>
- CC: Tobie Langel <tobie@w3.org>, Robin Berjon <robin@w3.org>, public-test-infra <public-test-infra@w3.org>
On 03/21/2013 11:38 PM, Dirk Pranke wrote: > > On Thu, Mar 21, 2013 at 8:31 AM, James Graham <jgraham@opera.com > So, I'm not exactly clear what you're proposing, but experience > suggests that the best way to identify flaky tests is upfront, by > running the test multiple (hundreds) of times before it is used as > part of a test run. Trying to use historical result data to identify > flaky tests sounds appealing, but it is much more complex since both > the test and the UA may change between runs. That doesn't mean it's > impossible, but I strongly recommend implementing the simple > approach first. > > > FWIW, WebKit has invested a fair amount of time in tracking test > flakiness over time. The initial up-front beating probably identifies > many problems, but we often find cases where tests can be flaky on > different machine configurations (e.g., runs fine on a big workstation, > but not on a VM) or can be flaky when the tests accidentally introduce > side effects into the environment or the test executables. Yes, agreed. A quarantine system isn't a panacea by any stretch of the imagination. But in terms of bang-for-buck it is the best approach I have found for avoiding needless problems. Of course it can't identify everything, but I don't know of any other technique that can either. > It's a significant source of pain, unfortunately. Tell me about it :(
Received on Friday, 22 March 2013 08:50:25 UTC