Re: Towards a better testsuite from Geoffrey Sneddon on 2016-04-03 (www-style@w3.org from April 2016)

From: Geoffrey Sneddon <me@gsnedders.com>
Date: Sun, 3 Apr 2016 23:33:14 +0100
To: Gérard Talbot <www-style@gtalbot.org>
Cc: W3C www-style mailing list <www-style@w3.org>
Message-ID: <CAHKdfMh7aU8qtj2iJ_X0Oew0M_CejwP4k+0WYochyqjwA34+bA@mail.gmail.com>
On Tue, Mar 29, 2016 at 7:04 PM, Gérard Talbot <www-style@gtalbot.org> wrote:
> Le 2016-03-29 05:04, Geoffrey Sneddon a écrit :
>> On 29/03/16 03:39, Gérard Talbot wrote:
>>> Le 2016-03-24 13:00, Geoffrey Sneddon a écrit :
>>>> - Get rid of the build system, replacing many of it's old errors with
>>>> a lint tool that tests for them.
>>>
>>>
>>>
>>> I do not understand what you mean by get rid of the build system and
>>> replace it with a lint tool.
>>
>>
>> The build system exists today primarily for the sake of producing copies
>> of the tests in multiple formats, and that's something that's not really
>> needed any more (as everything nowadays supports HTML and XHTML). A side
>> effect of the build system is that we have a lint tool that checks for a
>> number of errors in tests and warns for them.
>
>
> What kind of errors?
>
>> Given we no longer need the multiple formats, we should move to just
>> running something that picks up the errors in the tests.
>
> Can you specify the type of errors that such link tool should report?

Initially I suspect we'd start off with issues that the build system
currently picks up, for example reftests where the link points
somewhere invalid. Otherwise, I presume we'd likely want much the same
as what wpt uses, which includes: trailing whitespace, CRLF line
endings, references to test.csswg.org, testharness.js without
testharnessreport.js, etc.

>> I think a good testsuite that nobody runs and sees little contribution
>> is a pretty worthless testsuite.
>
>
> I am not against browser manufacturers importing testsuites from CSSWG test
> server, quite on the contrary. But that is a single, separate issue by
> itself.
>
> I am not against browser manufacturers submitting their tests to CSSWG test
> server, quite on the contrary. But that is a single, separate issue by
> itself.
>
> I am not against reducing the number of metadata flags to only needed ones:
> there is now, I believe, a wide consensus on this. But I disagree on getting
> rid of all of them.

I don't think one can really claim they're single, separate issues:
the browser vendors have on a number of occasions over the past five
years made it quite clear what's needed for them to contribute more
(and this includes reviewing tests). We can, of course, try and
convince them to compromise but I'm dubious whether they'll give up
much.

>> I'd much rather a testsuite with a
>> hundred tests with two bad than a perfect testsuite with only ten tests,
>> as the former likely has far more value and coverage, and it's failings
>> can be addressed.
>
> Number of tests imported from CSSWG test server and submitted to CSSWG test
> server are 2 (admittedly big, important, relevant for interoperability)
> issues.
>
> What you do with a) incorrect tests once discovered as such, b) tests that
> can not fail c) tests that do not test what they claim (or believe) to be
> testing and d) imprecise tests are other issues.
>
> If you reduce overall requirements and quality control (review) for test
> submissions, you can expect percentage of incorrect, can-not-fail,
> not-relevant and imprecise tests to increase, regardless of number of tests.
> How do you propose that a new system address those failings?

At the moment there is almost no ongoing review of tests (because
nobody looks at them much between their initial review and the spec
trying to leave CR), which doesn't help with the quality. If we have
browsers actively running tests, then at least incorrect tests should
be caught relatively quickly because people will notice them and look
at them. As for the other categories, I'd hope that most of them got
caught by the review. Certainly my experience is that w-p-t probably
has fewer bad tests despite it's comparatively lax processes.

>> My order of priority is roughly:
>>
>> a) Getting the tests actually run by browsers.
>
>
> I assume you mean getting browser manufacturers to submit their tests to
> CSSWG test server..., correct?
>
>> b=) Getting tests submitted.
>> b=) Getting tests correct. (Corrections are really just submissions.)
>
>
> I do not understand what you mean with "Corrections are really just
> submissions."

Both corrections and new tests being submitted are, ultimately,
patches. It doesn't really make sense to have a different process for
them.

>> c) Getting specs out of CR. (I realise this is a long way down, but it's
>> something that happens relatively infrequently, and really shouldn't be
>> the only time we care about implementations following the spec.)
>
>
> Specs can be validated with/after/thanks to testing. Tests can identify
> parts of specs that may/can/could cause or have problems.

While this is certainly true, my experience is that very few further
issues are found compared with what would be found during
implementation and the testing associated with that.

/Geoffrey
Received on Sunday, 3 April 2016 22:33:43 UTC