Evaluation process for test samples (2nd version)

Hi,

I'm sending a slightly revised version of my first proposal,
based on comments by Shadi and Chris.
Some of the steps below contain alternatives for status values,
but the list I would like to propose is:
* unconfirmed,
* new,
* pending bugfix,
* accepted pending TF decision,
* rejected pending TF decision,
* accepted by task force,
* accepted, (i.e. by WCAG WG),
* rejected.

The next thing to do is to check whether the process works
with a real test sample.



* 1: A test sample is uploaded to the CVS repository.
     Status is set to "unconfirmed".
     Test samples with this status are queued for review by
     the task force.

* 2: A task force member pre-reviews the test sample. 
     This review includes:
     - confirming that all the necessary files are available;
     - confirming that all the necessary files are valid [2];
     - proofreading the title, description and other text in
       the metadata;
     - making sure the links and the date are correct;
     - making sure that the location pointers are consistent
       with each other [3];
     - checking that file names and the ID in the metadata follow 
       our naming convention;
     - checking that the 'rule' ID actually exists in rulesets.xml;
     - check that the referenced technique or failure is really
       a technique of failure for the referenced 'rule' ID;
     - anything else I missed?

     * If the test sample passes this "administrative" check,
     its status is set to "new"
     and queued for the next step in the process.
     * If the test sample does not pass this check, its status is set
     to "pending bugfix" (or something similar) until it passes
     all the above checks. To fix these bugs, it can either
     be sent back to the submitter or, if the fix is obvious,
     it can be fixed by a tast force member.

* 3: The test sample goes through a second review, possibly
     (preferably?) by the same person who did the pre-review.
     This review is a content review where the reviewer
     evaluates how well the test sample addresses the technique.
     During this review, the test procedure in the referenced
     technique is also reviewed "to ensure that [it is]
     unambiguous, easy to read by humans and easy to implement
     in software" [4].

     * If the reviewer finds no issues with the test procedure,
     he/she proposes to accept or reject the test sample.
     * If the reviewer finds an issue with the test procedure,
     he/she proposes an alternative procedure and proposes
     to accept or reject the test sample based on this
     new procedure.
     These comments and evaluations are recorded somewhere public,
     possibly in a Wiki.
     For the status, we could use values such as 
     "accepted pending TF decision" and 
     "rejected pending TF decision".
     (The Conformance Test Process For WCAG 2.0 [1] uses
     "assigned" as the only possible result of this step,
     but it seems useful to record the proposal to accept
     or reject in the metadata.)

* 4: The task force reviews the test sample and the evaluation 
     and decides whether to accept or reject the test case [5].
 
     * If the test sample is accepted, the status becomes 
     "accepted by task force" (or "pending WCAG WG decision" or ...).
     This means that the test sample is ready from the perspective
     of the task force but needs review by the WCAG WG for a 
     final decision.
     * If the test sample is rejected, the status changes to 
     "pending bugfix" (or "unconfirmed"?). The reviewer must then
     contact the submitter and provide a rationale  for the
     rejection. The submitter can refine and resubmit the
     test sample; it then goes through the same process
     again, starting at step 2.

* 5: The WCAG WG reviews the test sample and accepts it or 
     sends it back to the task force, possibly with comments.

     * If the test sample is accepted, the status is changed to
     "accepted" and it does not need to be reviewed again until
     the WCAG WG publishes a new draft.
     * If the test sample is rejected, it is sent back to the task
     force and the status changes to "rejected";
     it then goes through the same process again, starting
     at step 3. [This part changed since my previous mail.]


The above description focuses on the entry and exit conditions
in each step in the process, so I have left out a few details,
for example, that we review test samples in batches and how
the task force decides on acceptance of a test sample.
Shadi suggested that we could use a Wiki for recording reviews and
TF decisions. I think this is a good idea (the WCAG WG made good use
of a Wiki when writing "Understanding WCAG 2.0" and the Techniques doc),
but I also think that the metadata should reflect the status of the 
test sample, and that's why I made a distinction between 
"accepted by task force" and "accepted [by WCAG WG"]. I think this
kind of metadata should be in TCDL, not in a Wiki, otherwise we
have status information in two places.

I have also left out how we may send our work to the WCAG WG,
for example through the mailing list or a questionnaire.
(Questionnaires can have time outs, which may be handy.)
Shadi pointed out that we can leave out this last step for now [6].


[1] http://www.w3.org/WAI/GL/WCAG20/tests/ctprocess.html
[2] Valid in the context of the test sample, so for example 
the technique may require an invalid HTML document but the 
metadata etcetera must still be complete and valid.
[3] All pointers within the same 'location' element point
to the same location in the test sample.
[4] WCAG 2.0 Test Samples Development Task Force (TSD TF) 
Work Statement: http://www.w3.org/WAI/ER/2006/tests/tests-tf
[5] As Chris pointed out, we need to decide how the task force makes
this decision: through a straw poll web page where people can accept or
reject the test sample and attach comments, or just during the weekly
calls, where the decisions are recorded in the minutes.
http://lists.w3.org/Archives/Public/public-wai-ert-tsdtf/2006Nov/
0030.html
Straw polls can speed up discussions, especially when people address
comments made by others in the same straw poll.
[6] http://lists.w3.org/Archives/Public/public-wai-ert-tsdtf/2006Nov/
0026.html

Best regards,

Christopohe Strobbe

-- 
Christophe Strobbe
K.U.Leuven - Departement of Electrical Engineering - Research Group on 
Document Architectures
Kasteelpark Arenberg 10 - 3001 Leuven-Heverlee - BELGIUM
tel: +32 16 32 85 51
http://www.docarch.be/ 

Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm

Received on Tuesday, 21 November 2006 15:20:42 UTC