Towards a better testsuite

Hi,

A bunch of us got together at TPAC last year to talk about how we
could make the CSS testsuite more useful to more people.
<https://www.w3.org/2015/10/28-testing-minutes.html> has minutes of
the breakout session we had on Plenary day, though plenty more talk
happened at other times. There's been some discussion on
public-css-testsuite too, for those of you who don't subscribe there
(you totally should!). Having spoken to Alan, we thought it was best
to take much of this to the whole group. I'll probably start
reappearing on telecons soon, too, all with ulterior motives, of
course.

Fundamentally the situation with testing the web platform at the
moment is that we essentially have three large repositories of tests:
test262 (from ECMA TC-39, focused on their specs), csswg-test (from
us), and web-platform-tests (covering everything else). From the point
of view of all browser vendors, it'd be great if we were all sharing
the vast majority of tests we write—and we're not anywhere near there
yet.

The current status, as I understand it, is: test262 I believe people
are mostly running old versions of and contributing little to;
Microsoft is running weekly updated versions of csswg-test and Gecko
is running a several-year-old version with no realistic plan to update
it, nobody contributes that much (a tiny subset of Gecko stuff is
automatically synced, but the vast majority is not);
web-platform-tests is run by Microsoft semi-regularly, is run with
two-way syncing from Gecko and Servo, with plans by Blink and
Microsoft to get there AIUI, and with more in the way of contributions
than either of the other two repositories. WebKit just aren't running
anything, far as I'm aware. The only other group I'm aware of running
anything is Prince, running a small subset of an old version of
csswg-test.

Speaking to people across all browsers about why they're generally
contributing more to web-platform-tests than csswg-test, there's a
more or less uniform answer: there's too much friction. Metadata and
review are the two parts held up as reasons; metadata because it means
that existing tests can't simply be added to the testsuite (and you're
then having to justify far more time to release the tests), review
because often enough comments come back months later by which time
everyone working on it has moved on and doesn't have time to address
minor nits.

I went through all of the all of the metadata in
<https://lists.w3.org/Archives/Public/public-css-testsuite/2015Oct/0004.html>;
there's rough agreement on removing much of what we currently have.
That said, I think it's worthwhile to reiterate that requiring *any*
metadata causes friction. Tests written by browser vendors are rarely
a file or two which is quick to add metadata too. I know in general
people seem interested in using the same infrastructure to run both
web-platform-tests and csswg-test, which essentially requires the
metadata required to run the tests be identical across the two.

The other significant complication when it comes to csswg-test is the
build system. Because of the necessity of building the testsuite first
it makes it more complicated to fix tests when they're failing; you
have to know how to find the source file and then be able to build the
testsuite after fixing it. Historically the build system existed to
deal with the variety in what UAs support, whether they supported HTML
and/or XHTML; this is a far smaller deal nowadays—of the UAs running
the CSS testsuite or likely to do so in the future, the only one I'm
aware of that doesn't support both well is Servo (and that's likely to
change so can possibly be ignored here).

This is, of course, all complicated by the need to be able to
demonstrate that CR exit criteria have been met. For things in
web-platform-tests, as I understand it, there's normally just been a
copy of one directory that's been used to create the CR-exit
testsuite. Aside from this, it's assumed that tests don't need to be
run on a per-spec basis: yes, some things will get missed when
building the CR-exit testsuite because they're in another directory
because they're testing how multiple specs interact, but it's not
worth the extra complexity to handle this, given the loss is normally
small.

The other notable difference is in tooling: Mercurial is used with a
git mirror, and then reviews are done split across Shepherd and
public-css-testsuite and some issues filed on GitHub, and with some
people expecting some pre-landing review through GitHub PRs and with
some people pushing directly to Mercurial… Really everything would be
simpler if we had *one* single way to do things. I'd much rather have
everything on GitHub, review happening on PR submission, and nits and
such like be reported as GitHub issues. This keeps everything in one
place with tools most people are used to.

To outline what I'd like to see happen:

- Get rid of the build system, replacing many of it's old errors with
a lint tool that tests for them.
- Policy changes to get rid of all metadata in the common case.
- Change the commit policy. (Require review first, require no new lint errors.)

Long-term it's probably worth considering merging the whole thing into
web-platform-tests, so we have all the W3C tests in one place.

I realise this omits a lot of detail in a fair few places: I'm just
trying to start off with something not *too* ginormous. :)

/gsnedders

Received on Thursday, 24 March 2016 17:00:53 UTC