EARL test coverage Re: Use Case Scenarios for EARL from Charles McCathieNevile on 2005-04-05 (public-wai-ert@w3.org from April 2005)

From: Charles McCathieNevile <charles@sidar.org>
Date: Tue, 05 Apr 2005 23:06:14 +1000
To: "Paul Walsh" <paulwalsh@segalamtest.com>, shadi@w3.org, public-wai-ert@w3.org
Message-ID: <op.sorf0oarw5l938@researchsft>
On Tue, 05 Apr 2005 03:12:01 +1000, Paul Walsh <paulwalsh@segalamtest.com>  
wrote:

> Perhaps I missed something or misinterpreted the thread. I thought the  
> idea was to use EARL as a click through from a WAI logo to demonstrate  
> compliance of an accessible website. If this is the case, then it won’t  
> necessarily add much more value to its current use, as it will not  
> actually prove anything.  It will only describe a list of test cases  
> results as collated from various sources, which will still not include  
> every test case performed by an auditor.

Actually, the basic working idea is that it will include every test case  
performed by an auditor. While it is true that this is very very difficult  
to achieve in an absolute sense, the framework in which EARL is expected  
to work makes it "functionally possible".

Let me explain in more detail:

A normal testing process consists of a large series of individual tests.  
Where these are done by an automated tool, it is trivial to record the  
results of each of them seperately, and they can be known completely.

In many areas of testing (accessibility is just one, and for that matter  
just one use case for EARL), it is necessary to test things which require  
human judgement. These are the cases where it is difficult to note every  
aspect of every test that is done.

However, it is generally possible to divide this testing into a number of  
smaler tests which can be answered seperately. "Did you like using this  
tool" is a very global test, whose results will always be based on a  
number of factors. But "does this text make sense as a functional  
replacement for this picture, yes or no?" is something reasonably  
straightforward, where you can expect a high degree (more than 75%) of  
consistency in the answers.

Particularly when these answers are recorded by tools (as is the case with  
things like AccVerify, Aprompt, Hera, SWAP), it is easy to note each on in  
EARL. So it really comes down to the specifications being tested against.

I believe it is a commonly held opinion that the best way to test  
accessibility is to get disabled users to actually use the thing being  
tested. While I agree in principle, I do no think this is feasible in  
practice. Given the range of disabilities that different people have and  
the consequent massive number of tests that need to be done to ensure  
reasonable coverage, and the fact that most people have better things to  
do than spend their entire life testing things, there are simply not  
enough people in the world to do the testing.

This is why specifications like WCAG are developed - to try and record  
everything we can about what happens when people with disabilities do the  
testing, and turn it into a set of tests or procedures that can, as much  
as possible, replicate what you learn without having the people involved.  
The success or failure of something like WCAG depends in part on how much  
it achieves that goal. It is impossible to replicate the experience of  
having real live disabled testers, but it is practically impossible to  
create that experience to the extent required too, and it is certainly  
beyond the reach of the creators of most websites. (Consider the budget  
implications of fining a group of testers with the various permutations of  
tunnel vision, colour blindness (any one type), complete blindness,  
spotted vision, complete deafness, quadriplegia, RSI in the hands, speech  
impediment making computer voice recognition impractical. Let alone  
actually employing all those people to do some testing...)

So to the extent that a testing specification manages to itemise its  
requirements, it should be possible for testing tools to itemise the  
responses (and the tests done, although in some cases it is possible to  
consolidate several tests for efficiency, just collecting the responses  
appropriate to the different aspects.

This is not about perfection, but about "good enough" - however we define  
that. I am not arguing that we have license to just do something that  
seems cool, and then leave it at that, either. One part of good enough is  
the enough - if it isn't sufficient then it might be good but not good  
enough. Something that I have seen trip up a lot of "accessibility" when  
itcomes to testing in the real world.

cheers

Chaals

-- 
Charles McCathieNevile                      Fundacion Sidar
charles@sidar.org   +61 409 134 136    http://www.sidar.org
Received on Tuesday, 5 April 2005 13:06:21 UTC