Re: "priority" of tests from Philip Jägenstedt on 2017-05-10 (public-test-infra@w3.org from April to June 2017)

From: Philip Jägenstedt <foolip@google.com>
Date: Wed, 10 May 2017 19:44:08 +0000
To: Patrick Kettner <patket@microsoft.com>, John Jansen <John.Jansen@microsoft.com>, "public-test-infra@w3.org" <public-test-infra@w3.org>, "jeffcarp@chromium.org" <jeffcarp@chromium.org>
Message-ID: <CAARdPYfLP6KgaqYEZfg+2YAhEDXNVf1x8cyHCyb0mS7pY9GDhQ@mail.gmail.com>

I think that the "interop risk" view might be even more useful if done at
the granularity of individual tests instead of files, although that's not
entirely trivial because there's no guarantee that the number and names of
tests match up. Any ideas for other ways to interrogate the data would be
appreciated.

For some kind of metadata, would that be at the test level? At least I tend
to write one file to test (for example) everything about a constructor, and
that would mix the serious with the trivial in the same file. But we have
no mechanism for annotating the individual tests.

One thing that might help filter out a few edge-case tests would be
checking the failure message. If it was because an exception *wasn't*
thrown, that's probably less of a problem in the wild than the tests where
an exception *is* thrown unexpectedly. But that would leave a huge chunk
tests still in need of categorization by other means.

On Wed, May 10, 2017 at 9:00 PM Patrick Kettner <patket@microsoft.com>
wrote:

> I think the use case that us Microsofties are thinking about is “here is a
> list of N thousand failures. How can we tell the various teams which to
> tackle first”.
>
>
>
> Individual tests can be investigated without problem, but finding the
> various needles within the haystack can be difficult. The recent updates
> Jeff did for the “interop risk” portion of the dashboard is very useful,
> and honestly gets a lot closer to what we are looking for, but it feels
> like there could be a bit more metadata on at least the new tests that are
> added that can indicate if something is edge or core functionality.
>
>
>
> patrick
>
> *From:* Philip Jägenstedt [mailto:foolip@google.com]
> *Sent:* Wednesday, May 10, 2017 11:41 AM
> *To:* John Jansen <John.Jansen@microsoft.com>; public-test-infra@w3.org;
> jeffcarp@chromium.org
> *Subject:* Re: "priority" of tests
>
>
>
> When looking at an individual test it's probably easy to place it on the
> real-world<->edge-case spectrum and make a judgement, but I assume you're
> looking for something that will scale better.
>
>
>
> We did a one-off triage of tests that fail in Chrome but pass in Firefox
> and Edge <https://bugs.chromium.org/p/chromium/issues/detail?id=651572> back
> in Sep 2016. I would say the signal:noise ratio was a bit on the low side,
> but I'm still optimistic about this method of finding tests worth
> investigation.
>
>
>
> We have an upcoming wpt dashboard, which +Jeff Carpenter
> <jeffcarp@chromium.org> is working on. A preview is public at
> https://wptdashboard.appspot.com but the data isn't entirely fresh. Once
> this is closer to done, filtering out tests which fail in some browser but
> pass in 2 or 3 others should hopefully be a guide to which things to
> investigate first. Especially if only one browser is failing and the test
> is on the real-world side of the spectrum, then that browser alone is
> holding back interop and can solve the problem by fixing just one bug.
>
>
>
> (I think one could learn something by inspecting just the repository and
> its history, but guess that something based on the test results is going to
> be much more useful.)
>
>
>
> Have you explored any other ideas?
>
>
>
> On Wed, May 10, 2017 at 7:31 PM John Jansen <John.Jansen@microsoft.com>
> wrote:
>
> Good morning,
>
> I'm trying to work out how to prioritize test failures seen with Web
> Platform Tests.
>
> We've had this discussion in the past, but I'm wondering if anyone on this
> list has had any inspired discovery or realization that might make things a
> bit better...
>
> I know for browser vendors this is incredibly challenging. Say we see 100
> failures in one test file, currently there is no way for me to know if
> those 100 failures are more or less important to the web than a single
> failure in some other test file. Of course, the priority for Edge cannot be
> determined by Chrome, so I am not asking for browser vendors to somehow
> dictate this. I'm wondering instead if there is a way we could have the
> people who write the tests or the people who write the specs (or both) come
> to some type of ranking.
>
> I am not sure how it looks. Perhaps "We've seen this construct actually
> used on sites" means it's HIGH priority. Or maybe, "No web dev would ever
> try to pass this invalid value in" means its LOW priority.
>
> Maybe people have already had this conversation and I'm not in the loop.
>
> Anyone?
>
> -John
>
>

Received on Wednesday, 10 May 2017 19:44:52 UTC