Re: acceptance criteria for new success criteria

>From my experience I doubt that "likely agreement by most reasonably informed evaluators" is easy to obtain.

During the development of WCAG EM  I set up a little experiment for the  EVAL Task force for SC 1.1.1 to prove this point. I presented an image (President Clinton) with a range of different alt values and asked TF participants to rate whether they would judge "pass" or "fail". There were clear cases where judgments converged, but on many alt options (alt text with some deficienies - too terse, too verbose, partly incorrect, spelling errors etc.) there was no agreement at all. 

There are many other SC where there is wiggle room regarding assessment: When is a heading or label sufficiently descriptive (SC 2.4.6)? Does skipping heading levels constitute a failure of SC 1.3.1, and if not, how bad must a heading hierarchy get before the tester calls it a fail? Etc. etc. You get the idea.

The other issue is quantitative - there can be very minor omissions on a page that some testers would tolerate (pass the page) while others cling to the least important instance (say, a supporter logo in the footer with unclear alt) and let the SC fail for that page. Similar things easily happen for other very comprehensive SC like 1.3.1.

It is nice to believe that consensus is obtainable but even among testers working to the same set of checkpoints with detailed rating instructions, we frequently experience disagreement - mostly because the issue context or the mapping of issues to SCs makes it hard to agree on a fair rating.

I think rating uncertainty in the face of compex web content is the 'elephant in the room' (if that's the correct metaphor). The insistence on accepting only SC that are clearly testable in a pass/fail fashion may be a good principle but we should acknowlege the reality that in actual applications of WCAG, there is frequently no consensus to be had if you put it to the test (i.e. test with different evaluators and compare results).

Best,
Detlev

--
Detlev Fischer
testkreis c/o feld.wald.wiese
Thedestr. 2, 22767 Hamburg

Mobil +49 (0)157 57 57 57 45
Fax +49 (0)40 439 10 68-5

http://www.testkreis.de
Beratung, Tests und Schulungen für barrierefreie Websites

White, Jason J schrieb am 31.05.2016 17:00:

> 
>  
> 
>  
> 
> 
> From: Andrew Kirkpatrick [mailto:akirkpat@adobe.com]
> Sent: Tuesday, May 31, 2016 10:50 AM
> 
> 
> Great, thanks for the clarification.
> 
> 
>  
> 
> 
> To clarify my point, I don’t believe that saying “most experts” is fine.  
> 
>  
> 
> The term used during the development of WCAG 2.0 was “high inter-rater reliability”. I don’t recall our discussion of exactly what the requirements were, but my general recollection is that it entailed likely agreement by most reasonably informed evaluators (not the same as agreement by most “experts”, which, to my mind, is a lower standard that is easier to meet).
> 
>  
> 
> 
> ----------------------------------------
> 
> This e-mail and any files transmitted with it may contain privileged or confidential information. It is solely for use by the individual for whom it is intended, even if addressed incorrectly. If you received this e-mail in error, please notify the sender; do not disclose, copy, distribute, or take any action in reliance on the contents of this information; and delete it from your system. Any other use of this e-mail is prohibited.
> 
> 
> Thank you for your compliance.
> ----------------------------------------

Received on Tuesday, 31 May 2016 15:37:09 UTC