expectedResult: SC or test procedure - was: Test samples with multiple techniques from Christophe Strobbe on 2009-02-25 (public-wai-ert-tsdtf@w3.org from February 2009)

From: Christophe Strobbe <christophe.strobbe@esat.kuleuven.be>
Date: Wed, 25 Feb 2009 19:26:27 +0100
To: TSDTF <public-wai-ert-tsdtf@w3.org>
Message-Id: <6.2.5.6.2.20090225185012.03070160@esat.kuleuven.be>
Hi Shadi, All,


At 17:52 24/02/2009, Shadi Abou-Zahra wrote:
>Hi Christophe,
>
>Christophe Strobbe wrote:
>>At 19:34 17/02/2009, Christophe Strobbe wrote:
>>>
>>>I did a quick search to find test samples with more than one technique.
>>>
>>>In the 3rd BenToWeb test suite, we had at least 27 of these, for example:
>
>>>  [...]
>
>>>All fail/pass statements were based on the success criterion, not 
>>>the test procedure in the technique/failure.
>>And in the TSD TF repository:
>
>>[...]
>
>>(Note that not all BenToWeb test cases have been migrated to the 
>>TSD TF repository, so there could be more of these.)
>
>This is a problem :(

If we insist on having only one technique (or failure) per test 
sample, the test cases with multiple techniques will not get migrated 
to the TSD TF repository (or they would need to be redone).

This still leaves the question what should be done with BenToWeb test 
cases that don't reference techniques. Migrate and upload first, and 
look for techniques/failures during the review process, or look for 
techniques/failure before migrating.



>>It is possible that some of the BenToWeb test case authors were 
>>somewhat too generous with relevant techniques.
>>However, this does not explain why we have we have 27 test samples 
>>with more than one technique.
>
>I Also do not know how this misunderstanding crept in...
>
>
>>E.g. sc3.3.1_l1_026 is about a mandatory text input field with 
>>error correction. SC 3.3.1 (Error Identification) says: If an input 
>>error is automatically detected, the item that is in error is 
>>identified and the error is described to the user in text. 
>>sc3.3.1_l1_026 references the following techniques:
>>* G83: Providing text descriptions to identify required fields that 
>>were not completed
>>* G85: Providing a text description when user input falls outside 
>>the required format or values
>>* SCR18: Providing client-side validation and alert
>>* G139: Creating a mechanism that allows users to jump to errors
>
>Note that the "How To Meet" document references these techniques for 
>a specific situation "If information provided by the user is 
>required to be in a specific data format or of certain values". It 
>is therefore not the only way to decide if the Success Criterion is met or not.

This confirms that pass statements from the referenced techniques 
don't automatically lead to a pass statement with regard to the 
Success Criterion.
It does not mean that we can't choose to evaluate test samples 
directly against the success criteria and base the expectedResult on that.



>>If we changed this test sample in order to map to only one 
>>technique, e.g., SCR18: Providing client-side validation and alert, 
>>would we then still make sure that the test sample meets SC 3.3.1?
>
>No. As far as I understand, test samples only refer to the 
>Techniques and do not make any claims about meeting Success Criteria 
>or not. We would need a whole layer of logics to combine the output 
>of each of these Techniques to determine if a Success Criterion is met or not.

As it is not possible to base pass/fail with regard to a success 
criterion on the pass/fail statements from the techniques, I would 
not be in favour of developing a layer of logics to combine the 
output of test procedures etc. However, as I stated above, I think it 
is perfectly possible to base the expectedResult directly on the 
success criteria. (After all, the success criteria are meant to be 
testable statements. I also assume we're smart enough to understand 
WCAG 2.0 success criteria.)



>>If the answer is yes, there may be no problem. But if the answer is 
>>no, how does this affect the test sample?
>
>In the worst case, we would copy the test files four times, one for 
>each of the referenced Techniques. The main thing is that the test 
>samples demonstrate correct and incorrect implementations of the Techniques.
>
>
>>If the test sample only passes the test procedure in SCR18, we can 
>>no longer state that it passes SC 3.3.1,
>
>Correct. In fact, we should not make any statements about passing or 
>failing Success Criteria on the level of the test samples.

This is the crux of the matter. Some of us think that expectedResult 
should be based purely on the test procedures in techniques/failures 
(as stated on <http://www.w3.org/WAI/ER/tests/>), while others think 
that expectedResult should be based on the success criteria (which is 
reflected by TCDL 2.0 
<http://bentoweb.org/refs/TCDL2.0.html#edef-locations>, the HTML view 
on the metadata - even in the mockups of the web interface -  and 
possibly, indirectly, in the scope criterion in the checklist for 
content review at <http://www.w3.org/WAI/ER/tests/process#content>).

I defend the latter choice because accessibility evaluators - whether 
humans or tools - aim to decide whether content passes or fails WCAG 
2.0, so they would try to find out whether content passes or fails 
such and such a success criterion. How they decide that is up to 
them, but my question is: What use is the test procedure of a 
technique to them if it is inconclusive with regard to meeting a 
success criterion? Hence, why is it useful for us to base 
expectedResult on these test procedures?



>>and we will need to build in a kind of disclaimer about this. (This 
>>would be a lot of extra work, unless we do this systematically for 
>>every test case, in which case we can automate it.)
>
>Maybe we need to be clearer in the descriptions on the repository 
>pages but I do not see why we need to add a disclaimer. We should 
>not refer to Success Criteria at all. Test samples are for tool 
>developers to improve the way they implement the Techniques 
>(automatically or manually).

The HTML view of the metadata has always stated: "The test case 
passes/fails success criterion xxx...)" (even in the current Web 
interface mockups) and nobody (as far as I can remember) commented on 
this... We also discussed naming conventions based on SC numbers or 
IDs, i.e. not based on technique/failure IDs...
The pass/fail statement can easily be changed to: "The test sample 
passes/fails the test proceure in technique/failure yyy..."



>>Such test samples would not be very useful as examples of good practice.
>
>Why not?

Because they don't implement sufficient techniques to meet an SC.


>Let's take SCR18 as suggested. Here is the test procedure:
>  - <http://www.w3.org/TR/WCAG20-TECHS/SCR18#SCR18-tests>
>
>Ideally we would have at least two test samples for this. One one of 
>them, an alert describes the error and on the other it does not. The 
>learning effect of these examples could be:
>
>Automated tool developers
>- develop heuristics that detect error messages, and help evaluators 
>to judge if they are good or bad (like they may do for ALT-attributes)
>
>Manual tool developers
>- develop mechanisms to simulate a form submission so that 
>evaluators can judge if alerts are triggered, and if they describe the errors
>
>Authoring tool developers
>- develop tools that generate code that behaves like the good 
>example, and learn about (and to avoid) the mistakes done in the bad example
>
>Web content developers
>- learn how form alerts should ideally behave and how not to do it
>
>....
>
>This was the initial objective of for developing these test samples:
>  - <http://www.w3.org/WAI/ER/2006/tests/tests-tf>

I don't see a contradiction between developing test samples for WCAG 
2.0 techniques and basing expectedResult directly on success criteria.

Best regards,

Christophe




>Again, I don't know how this confusion crept in but I think it is 
>still correctable...
>
>Best,
>   Shadi
>
>--
>Shadi Abou-Zahra - http://www.w3.org/People/shadi/ |
>   WAI International Program Office Activity Lead   |
>  W3C Evaluation & Repair Tools Working Group Chair |

-- 
Christophe Strobbe
K.U.Leuven - Dept. of Electrical Engineering - SCD
Research Group on Document Architectures
Kasteelpark Arenberg 10 bus 2442
B-3001 Leuven-Heverlee
BELGIUM
tel: +32 16 32 85 51
http://www.docarch.be/
---
Please don't invite me to LinkedIn, Facebook, Quechup or other 
"social networks". You may have agreed to their "privacy policy", but 
I haven't.


Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm
Received on Wednesday, 25 February 2009 18:27:10 UTC