Re: Identifying unstable tests from Robin Berjon on 2013-03-21 (public-test-infra@w3.org from January to March 2013)

From: Robin Berjon <robin@w3.org>
Date: Thu, 21 Mar 2013 16:58:21 +0100
To: James Graham <jgraham@opera.com>
CC: public-test-infra <public-test-infra@w3.org>
Message-ID: <514B2E1D.1030704@w3.org>

On 21/03/2013 16:31 , James Graham wrote:
> On Thu, 21 Mar 2013, Tobie Langel wrote:
>> The more complex question is what should be done with those test from
>> the time they are identified as problematic until they're fixed. And
>> how should this information be conveyed downstream.
>
> In the case of files where all the tests are unstable it's easy; back
> out the test. In the case of files with multiple tests of which only
> some are unstable, things are more complicated. In the simplest case one
> might be able to apply a patch to back out that subtest (obviously that
> requires human work). I wonder if there are more complex cases where
> simply backing out the test is undesirable?

With the scheme I had in mind, testing for flakiness would happen when 
the pull request is triggered. Presumably, it would raise a red flag on 
the PR. Of course, this could take a while to run, so the PR might be 
integrated too fast (but that should be a marginal case, and we can back 
files out indeed).

In any case, I would think that a file containing multiple tests only 
one of which shows up as flaky would be suspicious as a whole. So I'd 
back it out completely until a human steps in to fix the issue.

> Robin is going to kill me for this, but if we had manifest files rather
> than trying to store all the test metadata in the test name, we could
> store a list of child test names for each parent test url that are known
> to be unstable so that vendors would know to skip those when looking at
> the results.

I'm going to kill you for this.

I am *not* suggesting that we store all the test metadata in the test 
name, I am rather suggesting that we make use of metadata practices that 
are conducive to the metadata being staying correct over time.

For basic things, using the file system IMHO really works. But for stuff 
like listing tests found to be flaky, given that the flakiness is 
discovered automatically, I was thinking that the data would be stored 
automatically in a database (and set to green when it works).

Manifests are just plain text RDF. Just, you know, don't.

-- 
Robin Berjon - http://berjon.com/ - @robinberjon

Received on Thursday, 21 March 2013 15:58:34 UTC