Re: Towards a better testsuite from Lars Bergstrom on 2016-04-04 (www-style@w3.org from April 2016)

From: Lars Bergstrom <larsberg@mozilla.com>
Date: Mon, 4 Apr 2016 11:32:21 -0500
To: W3C www-style mailing list <www-style@w3.org>
Message-ID: <CABO6m47u0GhF3epG4dx+DteRYG66wzeMzhBK1X_crFE1PFFgMQ@mail.gmail.com>

On Sun, Apr 3, 2016 at 5:33 PM, Geoffrey Sneddon <me@gsnedders.com> wrote:
>
> At the moment there is almost no ongoing review of tests (because
> nobody looks at them much between their initial review and the spec
> trying to leave CR), which doesn't help with the quality. If we have
> browsers actively running tests, then at least incorrect tests should
> be caught relatively quickly because people will notice them and look
> at them. As for the other categories, I'd hope that most of them got
> caught by the review. Certainly my experience is that w-p-t probably
> has fewer bad tests despite it's comparatively lax processes.

As Servo is a new browser engine implementation, I can definitely
agree with this point. Because the workflow is a fairly smooth
bi-direcitonal sync script, we update our w-p-t tests quite regularly
- at least twice a month, based on a completely ad-hoc scan of our PR
logs (https://github.com/servo/servo/search?o=desc&p=1&q=%22Update+web+platform+tests+to+revision%22&s=created&type=Issues&utf8=%E2%9C%93).

I'd say that we're better at looking into new failures that we
encounter and investigating/fixing tests or specifications than we are
at verifying new passes are legitimate. We *do* catch unexpected
passes when they are in obviously strange places (e.g., where we don't
even implement a feature yet). James Graham or ms2ger could comment
more here.

That experience stands in contrast with our experiences with the CSS
WG tests, which we attempted to pick up and run but had trouble even
getting the build system to run on anybody's machine. Once we got
something working, when we tried to upstream a fix, the person who was
figuring out how to walk it backwards through the build system gave up
after a few days of poking around and asked ms2ger to do it. More
recently, we ended up contracting with Geoffrey to get more of the CSS
2.1 tests automated and updated, because I'm pretty sure if we had
tried to that ourselves we would have given up partway through and
just bulk-imported & shared the ad-hoc CSS test suite from Gecko.

I believe that we need to have a set of automated cross-browser tests
in this space that are shared and contributed to by all the major
engines, and I'm willing to continue to fund work in support of it
(caveat: Mozilla is *not* a wealthy organization!). But, we need to
have a system that will encourage the other engine developers to run
and contribute or we're not getting any leverage from this investment
over doing something way cheaper, such as just importing the ad-hoc
tests that exist today in our sibling organization.
- Lars

Received on Monday, 4 April 2016 19:22:58 UTC