Simplifying metadata from Geoffrey Sneddon on 2015-10-27 (public-css-testsuite@w3.org from October 2015)

From: Geoffrey Sneddon <me@gsnedders.com>
Date: Tue, 27 Oct 2015 16:31:29 +0900
To: public-css-testsuite@w3.org
Message-ID: <CAHKdfMgkt7cmH3nwH7bgiz86X-Y1THWk2O73=qW2qZj-r-rZLw@mail.gmail.com>
Yo!

Can we revisit all the metadata we have in the tests? The metadata we
*need* is what is sufficient to be able to run the tests, probably within
CI systems.

I'm going to go by the assumption (which I think has shown itself to be
true on numerous occasions) that the more metadata we require in tests the
more hoops people have to jump through to release tests, which discourages
submitting tests. And this WG has a real problem with getting good,
high-quality test suites such that we're able to advance tests beyond CR.

Disclaimer: this is all based on <
http://testthewebforward.org/docs/css-metadata.html>; I'm not totally sure
this actually reflects the status-quo of the tests.

a) "CSS 2.1 Reference" as a <title> for potentially hundreds of references
is utterly useless—I'd rather doing something more descriptive as to what
it is. Presto-testo has titles like:

 * Reference rendering - this should be green (green text)
 * Reference rendering - There should be no red below
 * Reference rendering - pass if F in Filler Text is upper-case

This isn't perfect either, but it's more useful than "CSS 2.1 Reference",
IMO.

b) We don't need author metadata on any new tests, because that metadata is
stored in git/hg. (It's essentially been entirely redundant since we moved
away from SVN, as git/hg can store arbitrary authorship data regardless of
whether the author has source-tree access.)

c) We haven't actively been adding reviewer metadata for quite a while. I
suggest if we *really* want reviewer metadata (which I'm not at all sure we
do—a single file may be reviewed by multiple people, especially in the
testharness.js case), we do it in the commit description (along the lines
of Signed-Off-By in the Linux repo). On the whole, I suggest we just go by
the assumption that anything in the repo has been reviewed (at the current
time outwith work-in-progress and vendor-imports), and don't bother storing
the metadata. It doesn't really matter—when do we need to know who reviewed
the test? The current model can be misleading, when the test is changed
there's still a "reviewed" link, but that person hasn't necessarily
reviewed the edited test.

d) Specification links I'm kinda unconvinced by, but I don't really care
enough to argue over. I know Shepard uses it.

e) Requirement flags I feel we should really revisit. We want to have
enough flags to be able to run the tests, especially in CI. I'm not so
interested in running tests through Shepard's UI, because I simply think
it's not valuable—it's almost never done, because it's pointlessly slow.
Browsers we should aim at getting them run in CI systems (probably with
some way to upload results to Shepard so we can have the CR-exit-criteria
views there), and minor UAs also likely can run the tests in a more
efficient way (as you want to determine pass/fail by unique screenshot, not
looking at a thousand tests all of which say, "This text should be green"
identically).

So:

* ahem — we should simply just state "the CSS test suite requires the Ahem
font to be available" and get rid of this flag
* animated — this is good because it has a real use (excluding tests from
automated testing)
* asis — this makes sense with the current build system
* combo — do we actually care? is anyone doing anything with this? In CI
systems you likely want to run all the files, combo and not.
* dom — sure, I suppose. not very useful for actual browsers, to be fair,
so just extra overhead to release tests.
* font — we should simply just state "the CSS test suite requires these
fonts to be installed" and get rid of this flag
* history — is there any UA that *doesn't* support session history? Yes, in
*theory* one could exist, but if we don't know of one, we shouldn't
optimise for it. The cost of metadata is too high (yes, even a flag!).
* HTMLonly — why do we have this rather than just using asis and HTML
source files?
* http — we should just move to using the same mechanism as
web-platform-tests for HTTP headers (rather than .htaccess), and then this
can statically be determined by whether test.html.headers exists for a
given test.html, leaving less metadata.
* image — like history, is this not just for a hypothetical UA?
* interact — sure.
* invalid — do we need an actual flag for this? I presume we want this for
lint tools, in which case we should probably have a better (generic) way to
silent bogus lint rules for a given test.
* may — on the whole reasonable
* namespace — is this not just for a hypothetical UA?
* nonHTML — can we not just use asis?
* paged — sure
* scroll — sure
* should — same as may
* speech — um, I guess
* svg — do we still want to treat SVG as something not universal?
* userstyle — sure
* 32bit — is this not just for a hypothetical UA at this point?
* 96dpi — is this not required by CSS 2.1 now, and hence redundant? (96px =
1in per CSS 2.1)

I feel like we shouldn't add metadata for hypothetical UAs that may or may
not exist in the future. It adds overhead to contributing to the test
suite, we're likely to end up with the flags being missing all over the
place. (We end up with flags needed for CI (animated, interact, userstyle,
paged, speech) missing all over the place of the ones needed to run tests
in existing browsers, when they support all the optional features we have
flags for!)

Also, I wonder if we should just merge "animated", "interact", and
"userstyle" into one? "userstyle" can probably be justified as CI tools can
run those tests in an automated manner by having some metadata within the
tool to set the stylesheet (do we want to add a <link
rel="user-stylesheet"> so that we can have required user stylesheets in
testinfo.data?). "animated" only makes sense to split out from "interact"
if anyone is ever going to verify animated content automatically.

For the sake of CI, we essentially have a few categories of tests:

 * reftests
 * visual tests that can be verified by screenshots
 * testharness.js tests
 * any of the three above that need special setup in the CI tool (a
non-standard browser window size, a user stylesheet needing set,
potentially paged media though I'm not sure if anyone actually runs paged
media tests in CI tools?)
 * manual tests that cannot be run in CI

Really that gives us seven types of tests, six of which can run in CI. The
first two above (so four of the six) can be distinguished by the presence
of link[@rel='match' or @rel='mismatch']. We could distinguish
testharness.js tests (two more of the six) by the presence of
script[@src='/resources/testharness.js']. This means the only things we
actually need to be able to filter out by explicit metadata are "needs
special CI setup" and "entirely manual". We probably want enough
granularity in the metadata such that people can trivially get a list of
tests that need special CI setup based upon what their CI supports (e.g.,
if it can set a user stylesheet but can't run paged media).

We should discuss what we actually need as a result of this and get rid of
the rest.

f) Test assertions… are we doing anything we this, beyond what we use
specification links for? If not, we should stop requiring them.

Now, that all done—I think I've forgotten some of what I meant to say, but
there's always future emails for that! I hope that sets up some idea of
what I want to do!

/gsnedders
Received on Tuesday, 27 October 2015 07:32:02 UTC