Evaluation process for test samples: first proposal

Hi,

I had an action item to propose an evaluation process 
for test cases. Below is a proposal based, on input from Shadi.

* 1: A test sample is uploaded to the CVS repository. 
     Status is set to "unconfirmed" (if we use the terminology from
     the Conformance Test Process For WCAG 2.0 [1]) or something 
     similar (e.g. "unreviewed").
     Test samples with this status are queued for review by 
     the task force.

* 2: A task force member pre-reviews the test sample. 
     This review includes:
     - confirming that all the necessary files are available;
     - confirming that all the necessary files are valid [2];
     - proofreading the title, description and other text in
       the metadata;
     - making sure the links and the date are correct;
     - making sure that the location pointers are consistent
       with each other [3];
     - checking that file names and the ID in the metadata follow 
       our naming convention;
     - checking that the 'rule' ID actually exists in rulesets.xml;
     - check that the referenced technique or failure is really
       a technique of failure for the referenced 'rule' ID;
     - anything else I missed?
     If the test sample passes this "administrative" check,
     its status is set to "new" (as in [1]) or "in review"
     (if we choose other terms)
     and queued for the next step in the process.
     If the test sample does not pass this check, its status is set
     to "pending bugfix" (or something similar) until it passes
     all the above checks. To fix these bugs, it can either
     be sent back to the submitter or, if the fix is obvious, 
     it can be fixed by a tast force member.

* 3: The test sample goes to a second review, possibly 
     (preferably?) by the same person who did the pre-review.
     This review is a content review where the reviewer
     evaluates how well the test sample addresses the technique.
     During this review, the test procedure in the referenced 
     technique is also reviewed "to ensure that [it is] 
     unambiguous, easy to read by humans and easy to implement
     in software" [4].
     If the reviewer finds no issues with the test procedure, 
     he/she proposes to accept or reject the test sample.
     If the reviewer finds an issue with the test procedure, 
     he/she proposes an alternative procedure and proposes 
     to accept or rejct the test sample based on 
     this new procedure.
     These comments and evaluations are recorded somewhere public.
     For the status, we could use value such as 
     "accepted pending TF decision".

* 4: The task force reviews the test sample and the evaluation 
     and decides whether to accept or reject the test case.
     If the test sample is accepted, the status becomes 
     "accepted by task force" or "pending WCAG WG decision" (or ...).
     This means that the test sample is ready from the perspective
     of the task force but needs review by the WCAG WG for a 
     final decision.
     If the test sample is rejected, the status changes to 
     "pending bugfix" (or "unconfirmed"?). The reviewer must then
     contact the submitter and provide a rationale  for the
     rejection. The submitter can refine and resubmit the
     test sample; it then goes through the same process
     again, starting at step 2.

* 5: The WCAG WG reviews the test sample and accepts it or 
     sends it back to the task force, possibly with comments.
     If the test sample is accepted, the status is changed to
     "accepted" and it does not need to be reviewed again until
     the WCAG WG publishes a new draft.
     If the test sample is rejected, it is sent back to the task
     force and the status changes to "pending bugfix"
     (or "unconfirmed"?); it then goes through the same 
     process again, starting at step 2.


The above description focuses on the entry and exit conditions
in each step in the process, so I have left out a few details,
for example, that we review test samples in batches and that
the task force decides on acceptance during a teleconference. 
I have also left out how we may send our work to the WCAG WG,
for example through the mailing list or a questionnaire.
(Questionnaires can have time outs, which may be handy.)


[1] http://www.w3.org/WAI/GL/WCAG20/tests/ctprocess.html
[2] Valid in the context of the test sample, so for example 
the technique may require an invalid HTML document but the 
metadata etc must still be complete and valid.
[3] All pointers withing the same 'location' element point
to the same location in the test sample.
[4] WCAG 2.0 Test Samples Development Task Force (TSD TF) 
Work Statement: http://www.w3.org/WAI/ER/2006/tests/tests-tf

Best regards,

Christophe

-- 
Christophe Strobbe
K.U.Leuven - Departement of Electrical Engineering - Research Group on 
Document Architectures
Kasteelpark Arenberg 10 - 3001 Leuven-Heverlee - BELGIUM
tel: +32 16 32 85 51
http://www.docarch.be/ 

Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm

Received on Monday, 20 November 2006 16:06:01 UTC