Re: "priority" of tests from Rick Byers on 2017-05-27 (public-test-infra@w3.org from April to June 2017)

From: Rick Byers <rbyers@google.com>
Date: Sat, 27 May 2017 01:19:57 -0400
To: Geoffrey Sneddon <me@gsnedders.com>
Cc: James Graham <james@hoppipolla.co.uk>, public-test-infra@w3.org
Message-ID: <CAFUtAY-y6iQ=3dOya-QNoO8MgXji_Gu4GCNSd8UeaDLbC4Jj2Q@mail.gmail.com>
Sorry I'm coming to this late, but I wanted to add a slightly different
perspective.

My biggest concern is ensuring that we stop the bleeding such that all new
engine work is done such that interop is the natural outcome of our
processes / culture instead of something we get to only after the fact via
great individual heroics.  In that vein, I haven't considered going through
existing WPT failures as very high priority. Skimming through the tests
passing in 2-3 other engines but failing in Chrome is an interesting
 exercise to validate the effort.  But before investing heavily to pay back
the debt from the past I believe we first need to really fix the path
forward.  So what has been top priority for me at least is:

1. Interop issues reported from the real world are handled well, with spec
and test updates and bugs filed on other engines.  This, IMHO is the
easiest way to prioritize interop issues - if even one developer complains
via a chromium bug, there's probably dozens suffering in silence.  And when
we hit issues where dozens of sites are affected, ensuring there are good
tests is almost certainly worthwhile.

2. All new feature and spec work comes with WPT tests.  We don't ship
unless there's a good suite with a high pass rate.  This is driving most of
our work because we can't reasonably ask for this until it's not much
harder than doing our traditional testing for every sort of feature.

3. Teams who already find WPT useful can use it easily instead of our
LayoutTests (consume, contribute, rely on it for regression prevention, get
notified of new failing tests in their area, easily visualize status in all
engines, etc.).  Until this is true, we'll never convince the teams who
don't really want to use it.

4. Strategic areas currently under active development in other engines
(nearing ship when Chrome has already shipped) have a high quality test
suite in place of chromium LayoutTests, and bugs filed for failures.  This
is why we're investing in the ServiceWorker and WebRTC suites right now.

I look forward to the day that our top priority is just working through the
years of debt around existing tests and missing coverage.  But from my
perspective we still have a long way to go before that would be highly
effective.

On May 18, 2017 12:54 PM, "Geoffrey Sneddon" <me@gsnedders.com> wrote:

> On Wed, May 10, 2017 at 9:50 PM, James Graham <james@hoppipolla.co.uk>
> wrote:
> >
> > On 10/05/17 20:44, Philip Jägenstedt wrote:
> >>
> >> For some kind of metadata, would that be at the test level? At least I
> tend
> >> to write one file to test (for example) everything about a constructor,
> and
> >> that would mix the serious with the trivial in the same file. But we
> have
> >> no mechanism for annotating the individual tests.
> >
> >
> > So that's technically untrue. But nevertheless I don't think a metadata
> based system will work. Historically we have never managed to get
> developers as a group to add metadata at all — even getting something as
> basic as useful commit messages is hard — and even where individuals have
> been motivated to add it it has always bitrotted rather quickly.
> >
> > I believe a plan based around getting people to add vague value
> judgements about the importance of tests would be doomed to failure. Even
> if we could get people to add this data at all, it would mostly be wrong
> when added and then later be even more wrong (because an "unimportant test"
> can be "important" when it turns out that specific condition is triggered
> in Facebook).
> >
> > I wish this wasn't true, but I think the reality is that there just
> isn't a simple solution to figuring out which tests are important. Often
> it's possible to tell that some class of failures isn't urgent (because
> e.g. as Philip says you recognise that a specific failure message relates
> to a known error in your WebIDL implementation), but otherwise you need
> someone with expertise to make a judgement call.
>
> Even without any metadata, I think there's various points of interest:
> obviously if something crashes it is likely a higher priority than any
> other fail, but similarly if something doesn't throw an exception then
> that's probably higher priority than throwing the wrong type of
> exception.
>
> The other thing that could be used is if the dashboard tracked browser
> bugs for failing tests: you could then see how other vendors have
> prioritised a failing test if they also fail it (or, if you keep some
> priority data around, previously have failed).
>
> I'm not opposed to adding metadata, but I think we do need buy-in from
> all vendors to actually be willing to add it to future tests: and
> that, really, has always been the sticking point; people have always
> wanted to use certain bits of metadata, but have never wanted to add
> it to new tests they write.
>
> /g
>
>
Received on Saturday, 27 May 2017 05:20:32 UTC