Re: Test samples with multiple techniques from Shadi Abou-Zahra on 2009-02-24 (public-wai-ert-tsdtf@w3.org from February 2009)

From: Shadi Abou-Zahra <shadi@w3.org>
Date: Tue, 24 Feb 2009 17:52:13 +0100
To: Christophe Strobbe <christophe.strobbe@esat.kuleuven.be>
CC: TSDTF <public-wai-ert-tsdtf@w3.org>
Message-ID: <49A425BD.8060807@w3.org>
Hi Christophe,

Christophe Strobbe wrote:
> 
> At 19:34 17/02/2009, Christophe Strobbe wrote:
>>
>> I did a quick search to find test samples with more than one technique.
>>
>> In the 3rd BenToWeb test suite, we had at least 27 of these, for example:

>>  [...]

>> All fail/pass statements were based on the success criterion, not the 
>> test procedure in the technique/failure.
> 
> And in the TSD TF repository:

> [...]

> (Note that not all BenToWeb test cases have been migrated to the TSD TF 
> repository, so there could be more of these.)

This is a problem :(


> It is possible that some of the BenToWeb test case authors were somewhat 
> too generous with relevant techniques.
> However, this does not explain why we have we have 27 test samples with 
> more than one technique.

I Also do not know how this misunderstanding crept in...


> E.g. sc3.3.1_l1_026 is about a mandatory text input field with error 
> correction. SC 3.3.1 (Error Identification) says: If an input error is 
> automatically detected, the item that is in error is identified and the 
> error is described to the user in text. sc3.3.1_l1_026 references the 
> following techniques:
> * G83: Providing text descriptions to identify required fields that were 
> not completed
> * G85: Providing a text description when user input falls outside the 
> required format or values
> * SCR18: Providing client-side validation and alert
> * G139: Creating a mechanism that allows users to jump to errors

Note that the "How To Meet" document references these techniques for a 
specific situation "If information provided by the user is required to 
be in a specific data format or of certain values". It is therefore not 
the only way to decide if the Success Criterion is met or not.


> If we changed this test sample in order to map to only one technique, 
> e.g., SCR18: Providing client-side validation and alert, would we then 
> still make sure that the test sample meets SC 3.3.1?

No. As far as I understand, test samples only refer to the Techniques 
and do not make any claims about meeting Success Criteria or not. We 
would need a whole layer of logics to combine the output of each of 
these Techniques to determine if a Success Criterion is met or not.


> If the answer is yes, there may be no problem. But if the answer is no, 
> how does this affect the test sample?

In the worst case, we would copy the test files four times, one for each 
of the referenced Techniques. The main thing is that the test samples 
demonstrate correct and incorrect implementations of the Techniques.


> If the test sample only passes the test procedure in SCR18, we can no 
> longer state that it passes SC 3.3.1, 

Correct. In fact, we should not make any statements about passing or 
failing Success Criteria on the level of the test samples.


> and we will need to build in a kind of disclaimer about this. (This 
> would be a lot of extra work, unless we do this systematically for every 
> test case, in which case we can automate it.)

Maybe we need to be clearer in the descriptions on the repository pages 
but I do not see why we need to add a disclaimer. We should not refer to 
Success Criteria at all. Test samples are for tool developers to improve 
the way they implement the Techniques (automatically or manually).


> Such test samples would not be very useful as examples of good practice.

Why not? Let's take SCR18 as suggested. Here is the test procedure:
  - <http://www.w3.org/TR/WCAG20-TECHS/SCR18#SCR18-tests>

Ideally we would have at least two test samples for this. One one of 
them, an alert describes the error and on the other it does not. The 
learning effect of these examples could be:

Automated tool developers
- develop heuristics that detect error messages, and help evaluators to 
judge if they are good or bad (like they may do for ALT-attributes)

Manual tool developers
- develop mechanisms to simulate a form submission so that evaluators 
can judge if alerts are triggered, and if they describe the errors

Authoring tool developers
- develop tools that generate code that behaves like the good example, 
and learn about (and to avoid) the mistakes done in the bad example

Web content developers
- learn how form alerts should ideally behave and how not to do it

....

This was the initial objective of for developing these test samples:
  - <http://www.w3.org/WAI/ER/2006/tests/tests-tf>

Again, I don't know how this confusion crept in but I think it is still 
correctable...

Best,
   Shadi

-- 
Shadi Abou-Zahra - http://www.w3.org/People/shadi/ |
   WAI International Program Office Activity Lead   |
  W3C Evaluation & Repair Tools Working Group Chair |
Received on Tuesday, 24 February 2009 16:52:50 UTC