Re: CSS Test Suite Management System Now Live from fantasai on 2011-09-23 (public-css-testsuite@w3.org from September 2011)

From: fantasai <fantasai.lists@inkedblade.net>
Date: Thu, 22 Sep 2011 17:40:56 -0700
To: public-css-testsuite@w3.org
Message-ID: <4E7BD598.1020102@inkedblade.net>
On 09/22/2011 04:40 PM, Linss, Peter wrote:
> On Sep 22, 2011, at 1:51 PM, fantasai wrote:
>
>> On 09/07/2011 07:22 PM, Linss, Peter wrote:
>>> Having survived the initial beta test, I'm pleased to announce the new CSS Test Suite Management System (code named 'Shepherd') is now online and ready for use.
>>>
>>> As always, if you find any bugs, please email me immediately.
>>
>> I think it would be useful to split "Needs Work" into more severe (the test is invalid)
>> and less severe cases (the test is valid, but could be better). In the latter case, we'd
>> want to unhook the test from the test results and reporting harness. From Gérard's wiki
>> list of issues, I see these major groups of Needs Work:
>>
>>    Needs Work - Incorrect  /* The test is wrong and should not be passed or doesn't test what's claimed. *
>>    Needs Work - Metadata   /* The test metadata needs correction or improvement. */
>>    Needs Work - Usability  /* The test is confusing or hard to judge. */
>>    Needs Work - Precision  /* The test is imprecise and may give false positives. */
>>    Needs Work - Format     /* Syntax errors, format violations, etc. */
>
> I initially had two levels of 'Needs Work' but decided to keep it to one so that it's easier to search for tests that need work.
>
> My thoughts were that the reason why it needs work should simply be stated in the comment.
>
> There is a status level for tests that shouldn't be part of the build (and therefore
> removed from the harness) and that's 'Rejected', meaning that the test should be removed
> rather than fixed.

I see only two reasons for a test to be rejected:
   1. It's a duplicate, so should be merged with an existing test.
   2. It's out-of-scope.

I think it'd be clearer to be able to label things as Duplicate or Out-of-Scope rather
than marking them as Rejected -- to me Rejected often means "invalid" or "needs work,
please fix and resubmit".

> While the harness and Shepherd don't talk to each other (yet), the harness does have a
> notion of tests reported as invalid, they're still presented as part of the suite and
> listed in results, but they get de-prioritized in testing order and counted separately
> in the reports. I would think a test that needs work for any of the reasons listed
> above should fall into that category as the results shouldn't be trusted (except for
> really minor issues like typos in the metadata).

I disagree; if the test's metadata is wrong, or it has a validation issue that doesn't
affect its results, or it's just awkward to use, that's no reason to distrust the pass/fail
results that are recorded.

> I do see the usefulness of having a "this test is ok, but could use a little tweak" vs
> "this test is broken and needs work before relying on the result".

Yes, I think this distinction is important, and if we have it in the results reporting
interface, we should have it in the test status tracker. For me at least, it helps when
prioritizing work on tests, since I know to fix broken tests first, and then work on
usability fixes etc.

~fantasai
Received on Friday, 23 September 2011 00:41:26 UTC