Re: Simplifying metadata from Linss, Peter on 2015-10-27 (public-css-testsuite@w3.org from October 2015)

From: Linss, Peter <peter.linss@hp.com>
Date: Tue, 27 Oct 2015 23:50:22 +0000
To: Geoffrey Sneddon <me@gsnedders.com>
CC: "public-css-testsuite@w3.org" <public-css-testsuite@w3.org>
Message-ID: <8B40E4CE-6BEA-4CF9-89E0-C447A95E17EB@hp.com>
> On Oct 27, 2015, at 5:50 AM, Geoffrey Sneddon <me@gsnedders.com> wrote:
> 
> [All inline.]
> 
> On Tue, Oct 27, 2015 at 5:10 PM, Linss, Peter <peter.linss@hp.com <mailto:peter.linss@hp.com>> wrote:
> The only metadata that Shepherd complains about if missing in tests is: title, at least one spec link, and the author. In reference files it only complains about missing author metadata (and will complain about spec links or assertions if present, because they shouldn’t be there).
> 
> (and for the record, Shepherd is the test suite manager tool, the test harness is a separate tool, as is the build system, though they do interact)
> 
> More comments inline.
> 
>> On Oct 27, 2015, at 12:31 AM, Geoffrey Sneddon <me@gsnedders.com <mailto:me@gsnedders.com>> wrote:
>> 
>> Yo!
>> 
>> Can we revisit all the metadata we have in the tests? The metadata we *need* is what is sufficient to be able to run the tests, probably within CI systems.
>> 
>> I'm going to go by the assumption (which I think has shown itself to be true on numerous occasions) that the more metadata we require in tests the more hoops people have to jump through to release tests, which discourages submitting tests. And this WG has a real problem with getting good, high-quality test suites such that we're able to advance tests beyond CR.
>> 
>> Disclaimer: this is all based on <http://testthewebforward.org/docs/css-metadata.html <http://testthewebforward.org/docs/css-metadata.html>>; I'm not totally sure this actually reflects the status-quo of the tests.
>> 
>> a) "CSS 2.1 Reference" as a <title> for potentially hundreds of references is utterly useless—I'd rather doing something more descriptive as to what it is. Presto-testo has titles like:
>> 
>>  * Reference rendering - this should be green (green text)
>>  * Reference rendering - There should be no red below
>>  * Reference rendering - pass if F in Filler Text is upper-case
>> 
>> This isn't perfect either, but it's more useful than "CSS 2.1 Reference", IMO.
>> 
>> b) We don't need author metadata on any new tests, because that metadata is stored in git/hg. (It's essentially been entirely redundant since we moved away from SVN, as git/hg can store arbitrary authorship data regardless of whether the author has source-tree access.)
> 
> The author metadata was originally provided because authors didn’t necessarily have access to the repo and sometimes submitted tests via email or other means. So the first committer was often not the author. I’m fine with only requiring author metadata in that case.
> 
> Oh, what I said was based on mistaken belief about hg. In git there are separate "Author" and "Committer" fields—and the committer can set the Author field to whatever they want, hence in git there's no reason to ever require author metadata in the file. This further makes me feel we should drop hg! As far as I'm aware the only reason why we have the hg/git mirroring is because much of the Shepard tooling integrates with hg—someone (probably you or me!) should investigate how much work it would be to migrate purely to git. (I heard something about contributors before. Will we actually lose any contributors if we move to git exclusively?)

Both because of the Shepherd tooling and because some contributors prefer hg (and several don’t know git). I can’t speak to how many we’d lose by switching to git exclusively, but I’ve been told by more than one that they don’t want to learn yet another source control system. Whether or not that burden would merely be an inconvenience or would drive some of them away I can’t say, but our contributors are so few and so valuable, that I’d rather not lose a single one.

> 
>> c) We haven't actively been adding reviewer metadata for quite a while. I suggest if we *really* want reviewer metadata (which I'm not at all sure we do—a single file may be reviewed by multiple people, especially in the testharness.js case), we do it in the commit description (along the lines of Signed-Off-By in the Linux repo). On the whole, I suggest we just go by the assumption that anything in the repo has been reviewed (at the current time outwith work-in-progress and vendor-imports), and don't bother storing the metadata. It doesn't really matter—when do we need to know who reviewed the test? The current model can be misleading, when the test is changed there's still a "reviewed" link, but that person hasn't necessarily reviewed the edited test.
> 
> This was always optional and was only used when reviewing tests outside the tooling (or before the tooling existed). Tests can be reviewed and approved directly in Shepherd (and Shepherd approvals are also triggered by reviewer metadata or simply merging a GitHub pull request by those with approval authority).
> 
> I was just going by what the documentation says, which says it should be used!
> 
>> d) Specification links I'm kinda unconvinced by, but I don't really care enough to argue over. I know Shepard uses it.
> 
> Shepherd uses it for tracking test to spec mapping, but it’s also required by the build system. This is how the build system determines which tests go in which test suites. The test harness also relies on the spec links to generate the spec annotations (visible in the CSSWG drafts). So this one is most definitely necessary (and is fact is the only metadata our tooling really relies on).
> 
> As I said, I don't care to argue over this! I'm surprised we determine test suite membership based on it, though—why don't we just base this on source directory?

Because many tests are linked to more than one spec so don’t live in a single test suite. (and I wasn’t arguing, just explaining)

> 
>> e) Requirement flags I feel we should really revisit. We want to have enough flags to be able to run the tests, especially in CI. I'm not so interested in running tests through Shepard's UI, because I simply think it's not valuable—it's almost never done, because it's pointlessly slow. Browsers we should aim at getting them run in CI systems (probably with some way to upload results to Shepard so we can have the CR-exit-criteria views there), and minor UAs also likely can run the tests in a more efficient way (as you want to determine pass/fail by unique screenshot, not looking at a thousand tests all of which say, "This text should be green" identically).
>> 
>> So:
>> 
>> * ahem — we should simply just state "the CSS test suite requires the Ahem font to be available" and get rid of this flag
>> * animated — this is good because it has a real use (excluding tests from automated testing)
>> * asis — this makes sense with the current build system
>> * combo — do we actually care? is anyone doing anything with this? In CI systems you likely want to run all the files, combo and not.
> 
> This flag was used in the CSS 2.1 test suite to good effect (for historical process reasons I won’t go in to here). It may be useful in the future to help meet CR exit criteria for other specs.
> 
> I'm not disagreeing with having such patterns—I'm disagreeing with the need for a flag for it!
>> * dom — sure, I suppose. not very useful for actual browsers, to be fair, so just extra overhead to release tests.
>> * font — we should simply just state "the CSS test suite requires these fonts to be installed" and get rid of this flag
>> * history — is there any UA that *doesn't* support session history? Yes, in *theory* one could exist, but if we don't know of one, we shouldn't optimise for it. The cost of metadata is too high (yes, even a flag!).
>> * HTMLonly — why do we have this rather than just using asis and HTML source files?
> 
> This is an indicator to the build system to not generate an XHTML version of the test (as it wouldn’t make sense)
> 
> Right, I get that. But isn't this equivalent to having an HTML source file and using the asis flag in it? What's the benefit of it going through the build system?

All the files go through the build system in one way or another.

As it’s currently implemented, the HTMLonly/nonHTML flags currently don’t imply asis, they still can get modified by the build system, as in updating paths to reference files, etc, they just avoid the conversion to to the other format. The asis flag prevents any modification of the file (and would be used for files with bad markup or other features that would not survive being parsed and re-serialized).

> 
>> * http — we should just move to using the same mechanism as web-platform-tests for HTTP headers (rather than .htaccess), and then this can statically be determined by whether test.html.headers exists for a given test.html, leaving less metadata.
> 
> Sure.
> 
>> * image — like history, is this not just for a hypothetical UA?
>> * interact — sure.
>> * invalid — do we need an actual flag for this? I presume we want this for lint tools, in which case we should probably have a better (generic) way to silent bogus lint rules for a given test.
> 
> In addition to the lint tools, the build code might use this (don’t recall if it does at the moment).
> 
> The build tools don't.
>> * may — on the whole reasonable
>> * namespace — is this not just for a hypothetical UA?
>> * nonHTML — can we not just use asis?
> 
> Like HTMLonly, it prevents the build system from generating an HTML version of the test
> 
> See what I said about HTMLonly above.
> 
>> * paged — sure
>> * scroll — sure
>> * should — same as may
>> * speech — um, I guess
>> * svg — do we still want to treat SVG as something not universal?
>> * userstyle — sure
>> * 32bit — is this not just for a hypothetical UA at this point?
> 
> Opera cared about this at one point, Presto didn’t use full 32 bit ints for some values.
> 
> While I can't actually speak for Opera anymore, I know they never going to run the test suites against Presto.

Sure, just explaining where this came from.

> 
>> * 96dpi — is this not required by CSS 2.1 now, and hence redundant? (96px = 1in per CSS 2.1)
>> 
>> I feel like we shouldn't add metadata for hypothetical UAs that may or may not exist in the future. It adds overhead to contributing to the test suite, we're likely to end up with the flags being missing all over the place. (We end up with flags needed for CI (animated, interact, userstyle, paged, speech) missing all over the place of the ones needed to run tests in existing browsers, when they support all the optional features we have flags for!)
>> 
>> Also, I wonder if we should just merge "animated", "interact", and "userstyle" into one? "userstyle" can probably be justified as CI tools can run those tests in an automated manner by having some metadata within the tool to set the stylesheet (do we want to add a <link rel="user-stylesheet"> so that we can have required user stylesheets in testinfo.data?). "animated" only makes sense to split out from "interact" if anyone is ever going to verify animated content automatically.
> 
> Animated and interact should remain separate, an animated test could (at least in theory) be tested automatically. One requiring user interaction would potentially require a different system for automatic testing (and some way of scripting the interaction).
> 
> Animated could *in theory* be tested automatically. As it is, I feel like it's there for some theoretical CI system. I think we should worry about such a CI system when anyone has an interest in creating one. I'm not interested in adding metadata (that will end up out of date if it doesn't matter) for theoretical cases which we have no reason to believe will exist in the near future and will run the tests.

But on the other hand, removing existing metadata that may actually be useful in the future is doing work now just to create more potential work later.

I’m all for simplifying the system for new authors, but I don’t see much benefit from dropping flags, there’s not a lot of burden here.

> 
>> For the sake of CI, we essentially have a few categories of tests:
>> 
>>  * reftests
>>  * visual tests that can be verified by screenshots
>>  * testharness.js tests
>>  * any of the three above that need special setup in the CI tool (a non-standard browser window size, a user stylesheet needing set, potentially paged media though I'm not sure if anyone actually runs paged media tests in CI tools?)
>>  * manual tests that cannot be run in CI
>> 
>> Really that gives us seven types of tests, six of which can run in CI. The first two above (so four of the six) can be distinguished by the presence of link[@rel='match' or @rel='mismatch']. We could distinguish testharness.js tests (two more of the six) by the presence of script[@src='/resources/testharness.js']. This means the only things we actually need to be able to filter out by explicit metadata are "needs special CI setup" and "entirely manual". We probably want enough granularity in the metadata such that people can trivially get a list of tests that need special CI setup based upon what their CI supports (e.g., if it can set a user stylesheet but can't run paged media).
>> 
>> We should discuss what we actually need as a result of this and get rid of the rest.
>> 
>> f) Test assertions… are we doing anything we this, beyond what we use specification links for? If not, we should stop requiring them.
> 
> These are for test reviewers to help understand what the test is trying to do. It’s often not obvious and can be helpful.
> 
> In my experience I don't find the test assertions help much—a few comments places at the relevant places in the test would be far more helpful to understand what's going on IMO.

Others have had different experiences and have explicitly asked for assertions to help with reviews. I accept that the same job can be done with comments, but having it in an extractable format is useful too.

Again, we’ve never rejected a test for missing this metadata, it’s at the author’s discretion.
Received on Tuesday, 27 October 2015 23:50:55 UTC