Re: Towards a better testsuite: Review/Commit Policy from Geoffrey Sneddon on 2016-04-19 (public-css-testsuite@w3.org from April 2016)

From: Geoffrey Sneddon <me@gsnedders.com>
Date: Tue, 19 Apr 2016 03:56:32 +0100
To: fantasai <fantasai.lists@inkedblade.net>
Cc: www-style list <www-style@w3.org>, "public-css-testsuite@w3.org" <public-css-testsuite@w3.org>
Message-ID: <CAHKdfMjsLPZm+1uf_AfLEMjhms9cr6XCDbiW=cMa45_Rg7SNDQ@mail.gmail.com>
On Fri, Apr 8, 2016 at 6:00 PM, fantasai <fantasai.lists@inkedblade.net> wrote:
> On 03/24/2016 01:00 PM, Geoffrey Sneddon wrote:
>>
>>
>> The other notable difference is in tooling: Mercurial is used with a
>> git mirror, and then reviews are done split across Shepherd and
>> public-css-testsuite and some issues filed on GitHub, and with some
>> people expecting some pre-landing review through GitHub PRs and with
>> some people pushing directly to Mercurial… Really everything would be
>> simpler if we had *one* single way to do things. I'd much rather have
>> everything on GitHub, review happening on PR submission, and nits and
>> such like be reported as GitHub issues. This keeps everything in one
>> place with tools most people are used to.
>>
>> To outline what I'd like to see happen:
>>
>> - Get rid of the build system, replacing many of it's old errors with
>> a lint tool that tests for them.
>> - Policy changes to get rid of all metadata in the common case.
>> - Change the commit policy. (Require review first, require no new lint
>> errors.)
>
>
> I agree with having a lint tool and changing the commit policy
> to be consistent between repositories. I'm not sure "require
> review first" is a good policy though; it's easier to review
> tests that have landed, and it also means we can easily take
> over fixing the test if the submitter is MIA. Also in many cases
> (e.g. for web dev contributors who aren't planning to stick around,
> but who want a particular issue tested) it's just more expedient
> to just fix the test, and explain what was fixed, than cycling
> through feedback.

I don't think that's true at all, in a world with distributed version
control systems, where one can easily just take the existing commit
history and work from that. I don't see why it's easier to review
tests that have landed, and I don't think it makes any real difference
when taking over fixing the tests (okay, maybe we have to open a new
PR because of not having push perms on the original branch, but I
don't think that's really significant).

FWIW, I think the experience of web-platform-tests is that the
submitter tends to only go MIA if the initial review takes forever.
(Indeed, most of the PRs that are stale awaiting the submitter are
submitted by some of the most frequent contirbutors, and stale because
the review picked up complex hard-to-fix issues and the contributor
simply hasn't bothered fixing them.)

> Once a test has been reviewed it would be considered part of
> the test suite; but unreviewed tests would be available for
> running as well, and as part of that process are more likely
> to get reviewed.

Is anybody actually going to run only the reviewed testsuite? I
suspect Mozilla would run the whole testsuite for both Gecko and Servo
in the future, if that were the policy, rather than the reviewed
subset, simply because the majority of unreviewed tests will be
correct and therefore add some value, however small.

I think there's also a risk if we continue down the commit-then-review
model of ending up in a situation like we did with 2.1 again, having
to essentially rubber-stamp the entire testsuite because nobody is
ever going to sit down and review thousands of unreviewed tests. If we
want to do commit-then-review, I think we need a real means to ensure
that we don't end up with an ever-growing backlog. I think
review-then-commit sufficiently changes the cost/reward trade-off such
that it incentivises reviewing tests (because otherwise we don't have
a testsuite to run), which is undoubtedly a gain.

> Changes to a test should get review. I agree though that having
> the original test writer do that review is not workable. It's
> nice to flag that person for the review if they are active (e.g.
> gtalbot), but not if they're going to ignore the pull request
> (e.g. contractors who have moved on to other things).

It's quite possibly worthwhile always pinging the author, but I don't
think it makes sense to make much in the way of rules as to who can
review tests. I don't think there's any evidence from
web-platform-tests that allowing anyone who believes themself
competent leads to any less stringent reviews.

> One thing the current system does track, which would be useful
> not to lose, is which tests have been competently reviewed vs
> those which have merely been rubber-stamped. We have a lot of
> tests that were merely rubber-stamped during the CSS2.1 cycle,
> and are therefore of questionable correctness / utility. It
> might make sense to simply mark the rubber-stamped tests as
> such using the <link> tag.

In the review-then-commit world, what's the benefit of having that
metadata in the test? What's the use-case?

I'd suggest that if we want deeper in-depth reviews of those
historically rubber-stamped we simply opened a metabug (because, as
mentioned before, a real issue tracker for the testsuite would be
good) of "go through and check these more thoroughly and open bugs on
anything that needs fixed".

/Geoffrey
Received on Tuesday, 19 April 2016 02:57:03 UTC