Re: Simplifying metadata from Gérard Talbot on 2015-10-28 (public-css-testsuite@w3.org from October 2015)

From: Gérard Talbot <css21testsuite@gtalbot.org>
Date: Wed, 28 Oct 2015 17:09:08 -0400
To: Geoffrey Sneddon <me@gsnedders.com>
Cc: Public CSS Test suite mailing list <public-css-testsuite@w3.org>
Message-ID: <9dabd8e62be910f44d096e918c252890@gtalbot.org>
Le 2015-10-27 03:31, Geoffrey Sneddon a écrit :
> Yo!
> 
> Can we revisit all the metadata we have in the tests? The metadata we
> *need* is what is sufficient to be able to run the tests, probably 
> within
> CI systems.

What about other needs like searching for tests? reviewing tests? test 
coverageability?

What is a "CI" system?

Some existing web-aware softwares with good CSS competence do not 
support printing or svg or some other features (scrollability, 32bit, 
XHTML, transitions, etc..). So those flags can be useful.

> I'm going to go by the assumption (which I think has shown itself to be
> true on numerous occasions) that the more metadata we require in tests 
> the
> more hoops people have to jump through to release tests, which 
> discourages
> submitting tests.

I'd like to believe this discouragement isn't as bad as you believe...

8 months ago (feb. 15th 2015), I've asked one major browser manufacturer 
why they did not submit tests and never got a response.

> And this WG has a real problem with getting good,
> high-quality test suites such that we're able to advance tests beyond 
> CR.

I'm all for a discussion on metadata and documentation ... but what 
about bad or wrong tests?
I wish
a) incorrect tests,
b) imprecise tests,
c) tests that can not fail (unreliable tests, non-trustworthy tests) and
d) tests that do not check what they claim to be checking
would be removed or rehabilitated or dealt with to start with. I've been 
asking for this in the last 4 years (june 28th 2011 and even before 
that) and it still has not been dealt with. And I'm not talking about a 
few dozen tests here...

> Disclaimer: this is all based on <
> http://testthewebforward.org/docs/css-metadata.html>; I'm not totally 
> sure
> this actually reflects the status-quo of the tests.
> 
> a) "CSS 2.1 Reference" as a <title> for potentially hundreds of 
> references
> is utterly useless—I'd rather doing something more descriptive as to 
> what
> it is. Presto-testo has titles like:
> 
>  * Reference rendering - this should be green (green text)

Not the shortest IMO but that is another issue.

Isn't that redundant? Test title says: "this should be green (green 
text)" and the pass-fail-conditions sentence says the same thing.

Where's the best location to describe the reference rendering? In the 
<title> text or in the filename?

>  * Reference rendering - There should be no red below

"below" can safely be dropped IMO.

>  * Reference rendering - pass if F in Filler Text is upper-case

Personally, I think about 30% to 40% of all existing tests could be 
re-engineered so that they would be associated with already available, 
already created and very frequently reused reference files. When I 
create a test, I always try to do this myself. That way,
a) I no longer have to think about creating a reference file,
b) this reduces server load when "doing" a test suite with the test 
harness and
c) this reduces the growth of N reference files to be referenced

Examples given:

ref-if-there-is-no-red  : referenced by 290 tests (2 changesets!)
http://test.csswg.org/shepherd/search/reference/name/ref-if-there-is-no-red/
http://test.csswg.org/source/css21/reference/ref-if-there-is-no-red.xht

ref-this-text-should-be-green : referenced by 43 tests
test.csswg.org/shepherd/reference/ref-this-text-should-be-green/
http://test.csswg.org/source/css21/reference/ref-this-text-should-be-green.xht

So, why re-create 2 reference files that already exist?

> This isn't perfect either, but it's more useful than "CSS 2.1 
> Reference",
> IMO.
> 
> b) We don't need author metadata on any new tests, because that 
> metadata is
> stored in git/hg. (It's essentially been entirely redundant since we 
> moved
> away from SVN, as git/hg can store arbitrary authorship data regardless 
> of
> whether the author has source-tree access.)

There is an owner and an author field in Shepherd...

> c) We haven't actively been adding reviewer metadata for quite a while. 
> I
> suggest if we *really* want reviewer metadata (which I'm not at all 
> sure we
> do—a single file may be reviewed by multiple people, especially in the
> testharness.js case), we do it in the commit description (along the 
> lines
> of Signed-Off-By in the Linux repo). On the whole, I suggest we just go 
> by
> the assumption that anything in the repo has been reviewed (at the 
> current
> time outwith work-in-progress and vendor-imports),

I strongly disagree. Tests not identified as reviewed should not be 
considered reviewed.

You can not presume or postulate that any/all test authors have the same 
CSS knowledge, depth of CSS experience. Same thing with test reviewers.

> and don't bother storing
> the metadata. It doesn't really matter—when do we need to know who 
> reviewed
> the test?

When a test involving a CSS module has been reviewed by one of its CSS 
spec editors, then such test becomes more trustworthy. Reviewing a test 
by a competent and independent party is important.

There is such a thing as Quality control, Quality assurance, 
de-subjectivizing the evaluation of your own work. It's even a 
fundamental principle in justice. I would need more time to explain all 
this.

> The current model can be misleading, when the test is changed
> there's still a "reviewed" link, but that person hasn't necessarily
> reviewed the edited test.
> 
> d) Specification links I'm kinda unconvinced by, but I don't really 
> care
> enough to argue over. I know Shepard uses it.

How well can we know if a test suite coverage is good ... if 
specification links are not edited in tests?

> e) Requirement flags I feel we should really revisit. We want to have
> enough flags to be able to run the tests, especially in CI. I'm not so
> interested in running tests through Shepard's UI, because I simply 
> think
> it's not valuable—it's almost never done, because it's pointlessly 
> slow.
> Browsers we should aim at getting them run in CI systems (probably with
> some way to upload results to Shepard so we can have the 
> CR-exit-criteria
> views there), and minor UAs also likely can run the tests in a more
> efficient way (as you want to determine pass/fail by unique screenshot, 
> not
> looking at a thousand tests all of which say, "This text should be 
> green"
> identically).
> 
> So:
> 
> * ahem — we should simply just state "the CSS test suite requires the 
> Ahem
> font to be available" and get rid of this flag
> * animated — this is good because it has a real use (excluding tests 
> from
> automated testing)
> * asis — this makes sense with the current build system
> * combo — do we actually care? is anyone doing anything with this? In 
> CI
> systems you likely want to run all the files, combo and not.

Personally, I would drop the combo flag. If I recall correctly, this was 
introduced in relation to this CSS2.1 test:
http://test.csswg.org/suites/css2.1/20110323/html4/dynamic-top-change-005.htm

> * dom — sure, I suppose. not very useful for actual browsers, to be 
> fair,
> so just extra overhead to release tests.
> * font — we should simply just state "the CSS test suite requires these
> fonts to be installed" and get rid of this flag

I disagree. I am convinced more fonts should be added into the special 
fonts package for CSS testing: this is already certainly true for CSS 
Writing Modes and probably for tests involving text writing with various 
dominant baseline (hanging, central, etc).
Not every web-aware softwares support embedded fonts and I believe we 
should not rely on embedded fonts anyway.

> * history — is there any UA that *doesn't* support session history? 
> Yes, in
> *theory* one could exist, but if we don't know of one, we shouldn't
> optimise for it. The cost of metadata is too high (yes, even a flag!).
> * HTMLonly — why do we have this rather than just using asis and HTML
> source files?
> * http — we should just move to using the same mechanism as
> web-platform-tests for HTTP headers (rather than .htaccess), and then 
> this
> can statically be determined by whether test.html.headers exists for a
> given test.html, leaving less metadata.
> * image — like history, is this not just for a hypothetical UA?
> * interact — sure.
> * invalid — do we need an actual flag for this? I presume we want this 
> for
> lint tools, in which case we should probably have a better (generic) 
> way to
> silent bogus lint rules for a given test.
> * may — on the whole reasonable
> * namespace — is this not just for a hypothetical UA?
> * nonHTML — can we not just use asis?

Yes. nonHTML can be merged with asis.

> * paged — sure
> * scroll — sure
> * should — same as may
> * speech — um, I guess
> * svg — do we still want to treat SVG as something not universal?
> * userstyle — sure
> * 32bit — is this not just for a hypothetical UA at this point?

Some browsers only support 16bit counter and z-index values (65536); 
some others support 32bit counter. So, 32bit flag was introduced for 
that reason.

http://test.csswg.org/suites/css2.1/latest/html4/chapter-12.html#s12.4

http://test.csswg.org/suites/css2.1/latest/html4/chapter-9.html#s9.9.1


> * 96dpi — is this not required by CSS 2.1 now, and hence redundant? 
> (96px =
> 1in per CSS 2.1)
> 
> I feel like we shouldn't add metadata for hypothetical UAs that may or 
> may
> not exist in the future. It adds overhead to contributing to the test
> suite, we're likely to end up with the flags being missing all over the
> place. (We end up with flags needed for CI (animated, interact, 
> userstyle,
> paged, speech) missing all over the place of the ones needed to run 
> tests
> in existing browsers, when they support all the optional features we 
> have
> flags for!)
> 
> Also, I wonder if we should just merge "animated", "interact", and
> "userstyle" into one? "userstyle" can probably be justified as CI tools 
> can
> run those tests in an automated manner by having some metadata within 
> the
> tool to set the stylesheet (do we want to add a <link
> rel="user-stylesheet"> so that we can have required user stylesheets in
> testinfo.data?). "animated" only makes sense to split out from 
> "interact"
> if anyone is ever going to verify animated content automatically.
> 
> For the sake of CI, we essentially have a few categories of tests:
> 
>  * reftests
>  * visual tests that can be verified by screenshots
>  * testharness.js tests
>  * any of the three above that need special setup in the CI tool (a
> non-standard browser window size, a user stylesheet needing set,
> potentially paged media though I'm not sure if anyone actually runs 
> paged
> media tests in CI tools?)
>  * manual tests that cannot be run in CI
> 
> Really that gives us seven types of tests, six of which can run in CI. 
> The
> first two above (so four of the six) can be distinguished by the 
> presence
> of link[@rel='match' or @rel='mismatch']. We could distinguish
> testharness.js tests (two more of the six) by the presence of
> script[@src='skins/classic/resources/testharness.js']. This means the 
> only things we
> actually need to be able to filter out by explicit metadata are "needs
> special CI setup" and "entirely manual". We probably want enough
> granularity in the metadata such that people can trivially get a list 
> of
> tests that need special CI setup based upon what their CI supports 
> (e.g.,
> if it can set a user stylesheet but can't run paged media).
> 
> We should discuss what we actually need as a result of this and get rid 
> of
> the rest.
> 
> f) Test assertions… are we doing anything we this, beyond what we use
> specification links for? If not, we should stop requiring them.

Test assertions are often difficult for test authors. But it is valuable 
for reviewers, test understanding (when you revisit 6 months later one 
of your own test which could be a complex test) and test coverage. The 
need for test assertions would be reduced if test authors would at least 
avoid meaningless, unhelpful, unsemantic id, class, function identifiers 
and undifferentiated characters in their tests.

Gérard

> 
> Now, that all done—I think I've forgotten some of what I meant to say, 
> but
> there's always future emails for that! I hope that sets up some idea of
> what I want to do!
> 
> /gsnedders

-- 
Test Format Guidelines
http://testthewebforward.org/docs/test-format-guidelines.html

Test Style Guidelines
http://testthewebforward.org/docs/test-style-guidelines.html

Test Templates
http://testthewebforward.org/docs/test-templates.html

CSS Naming Guidelines
http://testthewebforward.org/docs/css-naming.html

Test Review Checklist
http://testthewebforward.org/docs/review-checklist.html

CSS Metadata
http://testthewebforward.org/docs/css-metadata.html
Received on Wednesday, 28 October 2015 21:09:43 UTC