RE: "priority" of tests from Patrick Kettner on 2017-05-10 (public-test-infra@w3.org from April to June 2017)

From: Patrick Kettner <patket@microsoft.com>
Date: Wed, 10 May 2017 19:00:22 +0000
To: Philip Jägenstedt <foolip@google.com>, John Jansen <John.Jansen@microsoft.com>, "public-test-infra@w3.org" <public-test-infra@w3.org>, "jeffcarp@chromium.org" <jeffcarp@chromium.org>
Message-ID: <BN1PR0301MB0657DCB7A647B5A420C2E885DEEC0@BN1PR0301MB0657.namprd03.prod.outlook.>

I think the use case that us Microsofties are thinking about is “here is a list of N thousand failures. How can we tell the various teams which to tackle first”.

Individual tests can be investigated without problem, but finding the various needles within the haystack can be difficult. The recent updates Jeff did for the “interop risk” portion of the dashboard is very useful, and honestly gets a lot closer to what we are looking for, but it feels like there could be a bit more metadata on at least the new tests that are added that can indicate if something is edge or core functionality.

patrick
From: Philip Jägenstedt [mailto:foolip@google.com]
Sent: Wednesday, May 10, 2017 11:41 AM
To: John Jansen <John.Jansen@microsoft.com>; public-test-infra@w3.org; jeffcarp@chromium.org
Subject: Re: "priority" of tests

When looking at an individual test it's probably easy to place it on the real-world<->edge-case spectrum and make a judgement, but I assume you're looking for something that will scale better.

We did a one-off triage of tests that fail in Chrome but pass in Firefox and Edge<https://bugs.chromium.org/p/chromium/issues/detail?id=651572> back in Sep 2016. I would say the signal:noise ratio was a bit on the low side, but I'm still optimistic about this method of finding tests worth investigation.

We have an upcoming wpt dashboard, which +Jeff Carpenter<mailto:jeffcarp@chromium.org> is working on. A preview is public at https://wptdashboard.appspot.com but the data isn't entirely fresh. Once this is closer to done, filtering out tests which fail in some browser but pass in 2 or 3 others should hopefully be a guide to which things to investigate first. Especially if only one browser is failing and the test is on the real-world side of the spectrum, then that browser alone is holding back interop and can solve the problem by fixing just one bug.

(I think one could learn something by inspecting just the repository and its history, but guess that something based on the test results is going to be much more useful.)

Have you explored any other ideas?

On Wed, May 10, 2017 at 7:31 PM John Jansen <John.Jansen@microsoft.com<mailto:John.Jansen@microsoft.com>> wrote:
Good morning,

I'm trying to work out how to prioritize test failures seen with Web Platform Tests.

We've had this discussion in the past, but I'm wondering if anyone on this list has had any inspired discovery or realization that might make things a bit better...

I know for browser vendors this is incredibly challenging. Say we see 100 failures in one test file, currently there is no way for me to know if those 100 failures are more or less important to the web than a single failure in some other test file. Of course, the priority for Edge cannot be determined by Chrome, so I am not asking for browser vendors to somehow dictate this. I'm wondering instead if there is a way we could have the people who write the tests or the people who write the specs (or both) come to some type of ranking.

I am not sure how it looks. Perhaps "We've seen this construct actually used on sites" means it's HIGH priority. Or maybe, "No web dev would ever try to pass this invalid value in" means its LOW priority.

Maybe people have already had this conversation and I'm not in the loop.

Anyone?

-John

Received on Wednesday, 10 May 2017 19:01:03 UTC