Re: Towards a better testsuite from Geoffrey Sneddon on 2016-03-29 (www-style@w3.org from March 2016)

From: Geoffrey Sneddon <me@gsnedders.com>
Date: Tue, 29 Mar 2016 10:04:20 +0100
To: Gérard Talbot <www-style@gtalbot.org>
Cc: www-style list <www-style@w3.org>
Message-ID: <56FA4514.6060409@gsnedders.com>
On 29/03/16 03:39, Gérard Talbot wrote:
> Le 2016-03-24 13:00, Geoffrey Sneddon a écrit :
>> Speaking to people across all browsers about why they're generally
>> contributing more to web-platform-tests than csswg-test, there's a
>> more or less uniform answer: there's too much friction. Metadata and
>> review are the two parts held up as reasons; metadata because it means
>> that existing tests can't simply be added to the testsuite (and you're
>> then having to justify far more time to release the tests), review
>> because often enough comments come back months later by which time
>> everyone working on it has moved on and doesn't have time to address
>> minor nits.
> 
> 
> minor nits?

Really I'm counting anything that isn't strictly a correctness issue as
a minor nit.

>> - Get rid of the build system, replacing many of it's old errors with
>> a lint tool that tests for them.
> 
> 
> I do not understand what you mean by get rid of the build system and
> replace it with a lint tool.

The build system exists today primarily for the sake of producing copies
of the tests in multiple formats, and that's something that's not really
needed any more (as everything nowadays supports HTML and XHTML). A side
effect of the build system is that we have a lint tool that checks for a
number of errors in tests and warns for them.

Given we no longer need the multiple formats, we should move to just
running something that picks up the errors in the tests.

>> - Policy changes to get rid of all metadata in the common case.
>> - Change the commit policy. (Require review first, require no new lint
>> errors.)
>>
>> Long-term it's probably worth considering merging the whole thing into
>> web-platform-tests, so we have all the W3C tests in one place.
>>
>> I realise this omits a lot of detail in a fair few places: I'm just
>> trying to start off with something not *too* ginormous. :)
>>
>> /gsnedders
> 
> To me, any serious discussion about good tests, good testing and better
> testsuite has to start with fixing known incorrect tests. Incorrect
> tests documented as such. Reported as such. And then also address a)
> tests that can not fail, b) tests that do not test what they claim (or
> believe) to be testing.

I think a good testsuite that nobody runs and sees little contribution
is a pretty worthless testsuite. I'd much rather a testsuite with a
hundred tests with two bad than a perfect testsuite with only ten tests,
as the former likely has far more value and coverage, and it's failings
can be addressed.

My order of priority is roughly:

a) Getting the tests actually run by browsers.
b=) Getting tests submitted.
b=) Getting tests correct. (Corrections are really just submissions.)
c) Getting specs out of CR. (I realise this is a long way down, but it's
something that happens relatively infrequently, and really shouldn't be
the only time we care about implementations following the spec.)

I don't think it's worthwhile focusing *too* much on documenting
incorrect tests as such—if they're incorrect, they should be fixed or
deleted. We shouldn't have them lying around for forever. (If they're
incorrect only insofar as it affects a hypothetical implementation of
CSS, then I'd really consider that a more minor issue.) If tests are
actually getting run, then the vendors have an inherent interest in
their correctness.

The fact that historically we've expected the original test author to be
the only person to act on feedback is nothing but an impediment to
fixing issues: there's no good reason why we should prohibit anyone from
fixing them. And, I believe, nowadays that is the policy. That said, we
probably still have too much feedback on tests for easily fixed issues
without the person giving the feedback taking any action to fix it
themselves. I don't know if that's down to review processes being seen
as too slow or what.

I can't claim to know what the most practical way of dealing with the
large number of low-quality tests we have today; what I can claim is
that we should endeavour to avoid increasing that number. This may seem
contradictory to what I've said before about priorities, but if we have
a real review process for everything getting committed (i.e., we move to
review-then-commit as opposed to our current mix of both
commit-then-review and review-then-commit) we can avoid any more low
quality tests getting added, and should over time drive up the quality
of the entire testsuite.

/Geoffrey
Received on Tuesday, 29 March 2016 09:04:56 UTC