Test result storage (alternatively: replacing the CSS WG's Test Harness) from Geoffrey Sneddon on 2016-08-12 (public-test-infra@w3.org from July to September 2016)

From: Geoffrey Sneddon <me@gsnedders.com>
Date: Fri, 12 Aug 2016 21:23:12 +0100
To: public-test-infra <public-test-infra@w3.org>
Message-ID: <CAHKdfMjF_V5+wfZ7PPg=y3oup7itakYsY_LJTxUh6agiaMB+_Q@mail.gmail.com>

(Bcc'd www-style; can we try and keep responses in a single place, in
this case public-test-infra?)

We currently have a few systems for viewing results:

 * https://github.com/w3c/test-results
 * https://test.csswg.org/harness/

Both of these have their origins in tools designed to meet CR exit
criteria for specs, though both are very different. Really we want the
data to be useful for more than just this, because we really want
interoperability at more times than just when the spec leaves CR. It's
also useful to know when we have tests that fail in all
implementations, given it suggests the test is wrong.

At the moment, the former relies on someone running the tests with a
tool (wptrunner) and storing the JSON output and then processing it
before uploading it. The latter relies on someone using the built in
test runner or (as far as I'm aware far more occasionally) uploading a
text based format of results (there's only a small whitelist of users
who can, from memory).

Going forward, as we get more and more browsers running all the tests
in their CI systems, we can essentially leave all the running of the
tests to the vendors (though this does require knowing the diffs they
have to upstream), and just get them to push results to us, which
should provide more up-to-date results than we can realistically
expect to get any other way. (Of course, this doesn't quite work for
all vendors; Microsoft, I'm looking at you! I suspect in that case we
can get them to push results once per-release?)

This does, however, bring up several problems:

 * Versioning of test files and results, and when results get
invalidated by changes. (Noting, of course, that the invalidation
could be in some support file we cannot generally detect.) This is, of
course, an issue the CSS Test Harness already has. But with results
coming in more frequently and with web-platform-tests changing far
more frequently than csswg-test this would be a larger problem,
especially combined with vendors frequently being a week or so behind
the upstream test repos.
 * How to store all the data. An easy compromise is to simply limit it
to one build per-week per-platform.

To leave this without any real conclusion: does anyone have any
opinion on other things that *need* to be solved or that such a system
needs to solve? (Or even, to the contrary, anyone going to state that
such a system need not exist?)

/Geoffrey

Received on Friday, 12 August 2016 20:23:42 UTC