Re: Towards a better testsuite from Gérard Talbot on 2016-03-29 (www-style@w3.org from March 2016)

From: Gérard Talbot <www-style@gtalbot.org>
Date: Tue, 29 Mar 2016 14:04:41 -0400
To: Geoffrey Sneddon <me@gsnedders.com>
Cc: W3C www-style mailing list <www-style@w3.org>
Message-ID: <e94c547bb1886fc4c1bc3f55ee07a92c@gtalbot.org>
Le 2016-03-29 05:04, Geoffrey Sneddon a écrit :
> On 29/03/16 03:39, Gérard Talbot wrote:
>> Le 2016-03-24 13:00, Geoffrey Sneddon a écrit :
>>> Speaking to people across all browsers about why they're generally
>>> contributing more to web-platform-tests than csswg-test, there's a
>>> more or less uniform answer: there's too much friction. Metadata

Geoffrey,

I am not against reducing the number of metadata flags.

>>> and
>>> review are the two parts held up as reasons; metadata because it 
>>> means
>>> that existing tests can't simply be added to the testsuite (and 
>>> you're
>>> then having to justify far more time to release the tests), review
>>> because often enough comments come back months later by which time
>>> everyone working on it has moved on and doesn't have time to address
>>> minor nits.
>> 
>> 
>> minor nits?
> 
> Really I'm counting anything that isn't strictly a correctness issue as
> a minor nit.
> 
>>> - Get rid of the build system, replacing many of it's old errors with
>>> a lint tool that tests for them.
>> 
>> 
>> I do not understand what you mean by get rid of the build system and
>> replace it with a lint tool.
> 
> The build system exists today primarily for the sake of producing 
> copies
> of the tests in multiple formats, and that's something that's not 
> really
> needed any more (as everything nowadays supports HTML and XHTML). A 
> side
> effect of the build system is that we have a lint tool that checks for 
> a
> number of errors in tests and warns for them.

What kind of errors?

> Given we no longer need the multiple formats, we should move to just
> running something that picks up the errors in the tests.

Can you specify the type of errors that such link tool should report?

>>> - Policy changes to get rid of all metadata in the common case.
>>> - Change the commit policy. (Require review first,

If review is required when submitting tests, then someone or 
(preferably) a consensus among test contributors (notably browser 
manufacturers) will have to define (or redefine) what the reviewer has 
to report, what is expected (or required) from the reviewing process.

If review of test submitted is required, then test authors should (are 
expected to) act consequently.


>>> require no new lint
>>> errors.)
>>> 
>>> Long-term it's probably worth considering merging the whole thing 
>>> into
>>> web-platform-tests, so we have all the W3C tests in one place.
>>> 
>>> I realise this omits a lot of detail in a fair few places: I'm just
>>> trying to start off with something not *too* ginormous. :)
>>> 
>>> /gsnedders
>> 
>> To me, any serious discussion about good tests, good testing and 
>> better
>> testsuite has to start with fixing known incorrect tests. Incorrect
>> tests documented as such. Reported as such. And then also address a)
>> tests that can not fail, b) tests that do not test what they claim (or
>> believe) to be testing.
> 
> I think a good testsuite that nobody runs and sees little contribution
> is a pretty worthless testsuite.

I am not against browser manufacturers importing testsuites from CSSWG 
test server, quite on the contrary. But that is a single, separate issue 
by itself.

I am not against browser manufacturers submitting their tests to CSSWG 
test server, quite on the contrary. But that is a single, separate issue 
by itself.

I am not against reducing the number of metadata flags to only needed 
ones: there is now, I believe, a wide consensus on this. But I disagree 
on getting rid of all of them.

> I'd much rather a testsuite with a
> hundred tests with two bad than a perfect testsuite with only ten 
> tests,
> as the former likely has far more value and coverage, and it's failings
> can be addressed.

Number of tests imported from CSSWG test server and submitted to CSSWG 
test server are 2 (admittedly big, important, relevant for 
interoperability) issues.

What you do with a) incorrect tests once discovered as such, b) tests 
that can not fail c) tests that do not test what they claim (or believe) 
to be testing and d) imprecise tests are other issues.

If you reduce overall requirements and quality control (review) for test 
submissions, you can expect percentage of incorrect, can-not-fail, 
not-relevant and imprecise tests to increase, regardless of number of 
tests. How do you propose that a new system address those failings?


> My order of priority is roughly:
> 
> a) Getting the tests actually run by browsers.

I assume you mean getting browser manufacturers to submit their tests to 
CSSWG test server..., correct?

> b=) Getting tests submitted.
> b=) Getting tests correct. (Corrections are really just submissions.)

I do not understand what you mean with "Corrections are really just 
submissions."

> c) Getting specs out of CR. (I realise this is a long way down, but 
> it's
> something that happens relatively infrequently, and really shouldn't be
> the only time we care about implementations following the spec.)

Specs can be validated with/after/thanks to testing. Tests can identify 
parts of specs that may/can/could cause or have problems.

Gérard

> I don't think it's worthwhile focusing *too* much on documenting
> incorrect tests as such—if they're incorrect, they should be fixed or
> deleted. We shouldn't have them lying around for forever. (If they're
> incorrect only insofar as it affects a hypothetical implementation of
> CSS, then I'd really consider that a more minor issue.) If tests are
> actually getting run, then the vendors have an inherent interest in
> their correctness.
> 
> The fact that historically we've expected the original test author to 
> be
> the only person to act on feedback is nothing but an impediment to
> fixing issues: there's no good reason why we should prohibit anyone 
> from
> fixing them. And, I believe, nowadays that is the policy. That said, we
> probably still have too much feedback on tests for easily fixed issues
> without the person giving the feedback taking any action to fix it
> themselves. I don't know if that's down to review processes being seen
> as too slow or what.
> 
> I can't claim to know what the most practical way of dealing with the
> large number of low-quality tests we have today; what I can claim is
> that we should endeavour to avoid increasing that number. This may seem
> contradictory to what I've said before about priorities, but if we have
> a real review process for everything getting committed (i.e., we move 
> to
> review-then-commit as opposed to our current mix of both
> commit-then-review and review-then-commit) we can avoid any more low
> quality tests getting added, and should over time drive up the quality
> of the entire testsuite.
> 
> /Geoffrey
Received on Tuesday, 29 March 2016 18:05:17 UTC