RE: Definition of human testability from Gregg Vanderheiden on 2004-05-03 (w3c-wai-gl@w3.org from April to June 2004)

From: Gregg Vanderheiden <gv@trace.wisc.edu>
Date: Mon, 3 May 2004 10:51:20 -0500
To: <w3c-wai-gl@w3.org>
Message-Id: <200405031551.i43FpKMP017162@jalopy.cae.wisc.edu>
Hi John, all, 

    Two points.   

The question isn't whether it is hard to get agreement at the far ends.   It
would be easy to get people to agree on really bad and really good text.
But the majority of content falls into the category in the middle.  And
there I think we would get lots of disagreement.   

Also, it is hard to determine for 'whom' it is supposed to be easy to read.
It would be hard to get agreement even on this I'm afraid.   If this is
accessibility - one would presume it meant easy to read for people with
disabilities.    But which ones do we include and which do we exclude....

This is, I think part of our problem. It is so hard for any of us to draw a
line -- so we want to say - "Do as much as you can" but that isn't testable.
The best we have come up with is "Consider all these strategies" which is
basically the same thing. 

We need to talk in specifics I think - since most of us want to endorse
generalities that end up being contradictory when we get to the specifics. 
 
Gregg

 -- ------------------------------ 
Gregg C Vanderheiden Ph.D. 
Professor - Ind. Engr. & BioMed Engr.
Director - Trace R & D Center 
University of Wisconsin-Madison 


-----Original Message-----
From: w3c-wai-gl-request@w3.org [mailto:w3c-wai-gl-request@w3.org] On Behalf
Of John M Slatin
Sent: Monday, May 03, 2004 8:57 AM
To: Charles McCathieNevile; Gregg Vanderheiden
Cc: w3c-wai-gl@w3.org
Subject: RE: Definition of human testability


I think Gregg's assertion that it would be impossible for a group users
to agree whether a particular text was "written clearly" or not is
incorrect.

It might be difficult to get 100% agreement among a large group of
reviewers.  But it might well be possible to get 80% or even 90%
agreement in some situations.

There are various ways to doing such things.  For example, a group of
readers might be brought together and asked to rate a series of
documents for clrity, on a scale of 1 (very unclear) to 5 (very clear).
In the first round, I would expect substantial disagreement, especially
concerning longer and more complex documents.  Over time, and after
discussion in which the raters talked with each other about what they
were looking for, I would expect increasing agreement to emerge, even
about the longer and more complex documents.

There are a number of situations in which this sort of rating takes
place.  Portfolio-based assessment of student learning, for example,
depends heavily on the ability of multiple readers to agree on evidence
of learning demonstrated across a portfolio of student work. My
colleague, Peg Syverson, maintains an excellent Web site about the
Learning Record Online [1] that includes the complex scales used to
represent learning across five dimensions, as well as a very
comprehensive bibliography.

In the US, students seeking admission to university take a standardized
examination that includes a written component. The essays that students
produce in response to exam prompts are read by people who've been
trained in the "holistic" scoring techniques I mentioned.

John
[1] http://www.cwrl.utexas.edu/~syverson/olr/evaluation.html



"Good design is accessible design." 
Please note our new name and URL!
John Slatin, Ph.D.
Director, Accessibility Institute
University of Texas at Austin
FAC 248C
1 University Station G9600
Austin, TX 78712
ph 512-495-4288, f 512-495-4524
email jslatin@mail.utexas.edu
web http://www.utexas.edu/research/accessibility/


 



-----Original Message-----
From: w3c-wai-gl-request@w3.org [mailto:w3c-wai-gl-request@w3.org] On
Behalf Of Charles McCathieNevile
Sent: Sunday, May 02, 2004 1:56 am
To: Gregg Vanderheiden
Cc: w3c-wai-gl@w3.org
Subject: RE: Definition of human testability



Sorry, my point was more subtle. I understand the general principle of
agreeing on the results of tests. (This is why I participate in
EuroAccessibility under my Sidar hat - Sidar thinks it is very important
that we clarify and agree on how to test WCAG 1, at least across
Europe).

But we often seem to be saying "for things that aren't machine testable,
we want to get consistent results from people". And I just wanted to
clarify that we expect all tests to be testable by people, with
consistent results, including those that are often automated.

For example, validation of HTML and XHTML code is something that
machines do more efficiently than people, and on average more
accurately. But there are bugs from time to time in validators, which a
group of people who know the relevant specification can all identify. In
such a case, I believe we want to say that the people are right and the
machine test is wrong. Otherwise it will be necessary to identify the
particular machine tests we trust, which I think will add about 2 years
to the timeline...

cheers

Chaals

On Sat, 1 May 2004, Gregg Vanderheiden wrote:

>I'm not following you Charles.
>
>What this says - is that all success criteria must be reliably 
>testable. That is, we can't have success criteria like "Write clearly"
since 10 users
>would differ on what constituted 'clearly'.     The test cannot be more
>specific than the guideline, so all the testers could go on was their 
>own training for what constituted 'clearly'.
>
>NOTE: that it is not yet clear whether all of the SC we have are 
>specific enough to be reliably testable without referring to technology

>specific checklists.  But that is another discussion.  Hopefully we can

>make them specific enough in the doc.  What the consensus is though - 
>is just that we should not have anything listed in the SC category that

>an author cannot reliably determine (or have determined) that they have

>met.
>
>Make sense now?  If not - then we need to figure out how to word it 
>better.
>
>
>Gregg
>
> -- ------------------------------
>Gregg C Vanderheiden Ph.D.
>Professor - Ind. Engr. & BioMed Engr.
>Director - Trace R & D Center
>University of Wisconsin-Madison
>
>
>-----Original Message-----
>From: Charles McCathieNevile [mailto:charles@w3.org]
>Sent: Saturday, May 01, 2004 12:41 PM
>To: Gregg Vanderheiden
>Cc: w3c-wai-gl@w3.org
>Subject: RE: Definition of human testability
>
>This seems backwards. Presumably we believe that all tests will produce

>consistent results when done by reasonably knowledgeable people, with 
>some of them also being sufficiently simple to automate completely.
>
>Otherwise we have no basis for deciding whether a particular test that 
>a tool does is in fact a valid one or not, and in the case of two 
>conflicting results from tools we would not have any way of declaring 
>which was accurate...
>
>cheers
>
>Chaals
>
>On Thu, 29 Apr 2004, Gregg Vanderheiden wrote:
>
>>Yes
>>
>>That is what is intended I believe.
>>
>>Your alternative wording #1 is closest.   The word "certain" isn't
quite
>>right since it would apply to all of the non-machine testable items
so it
>>would become
>>
>>
>>
>>1. In the judgment of the working group members, the success criteria 
>>that are not machine testable can be tested by humans in a manner that

>>is
>capable
>>of yielding consistent results among multiple knowledgeable testers.
>>
>>
>>
>>
>>Gregg
>>
>> -- ------------------------------
>>Gregg C Vanderheiden Ph.D.
>>Professor - Ind. Engr. & BioMed Engr.
>>Director - Trace R & D Center
>>University of Wisconsin-Madison
>>
>>  _____
>>
>>From: w3c-wai-gl-request@w3.org [mailto:w3c-wai-gl-request@w3.org] On
>Behalf
>>Of Sailesh Panchang
>>Sent: Thursday, April 29, 2004 11:22 AM
>>To: w3c-wai-gl@w3.org
>>Subject: Definition of human testability
>>
>>
>>
>>    Present draft: "Success criteria for all levels would be testable.

>>Some success criteria may be machine-testable. Others may require 
>>human judgment. Success criteria that require human testing would, in 
>>the judgment of the working group members,  yield consistent results 
>>among multiple knowledgeable testers."
>>Comment:
>>Wording of the last sentence is confusing. I believe what is meant is:
>>"Judgment of the working group members" applies to identification of
>>criteria that can be tested with  consistency  and reliability  by
humans.
>>Right?
>>Do we intend to list these tests?
>>Consider following alternatives:
>>1. In the judgment of the working group members, certain success
criteria
>>can be tested by humans in a manner that is capable of yielding
consistent
>>results among multiple knowledgeable testers.
>>
>>
>>
>>2. Claims of conformance  to success criteria can be based on human 
>>testing is such testing has yielded or is capable of yielding 
>>consistent results among multiple knowledgeable testers.
>>
>>Sailesh Panchang
>>
>>Senior Accessibility Engineer
>>Deque Systems,11180  Sunrise Valley Drive,
>>4th Floor, Reston VA 20191
>>Tel: 703-225-0380 Extension 105
>>E-mail: sailesh.panchang@deque.com
>>Fax: 703-225-0387
>>* Look up <http://www.deque.com> *
>>
>>
>>
>>
>>
>>
>>
>
>Charles McCathieNevile  http://www.w3.org/People/Charles  tel: +61 409 
>134 136
>SWAD-E http://www.w3.org/2001/sw/Europe         fax(france): +33 4 92
38 78
>22
> Post:   21 Mitchell street, FOOTSCRAY Vic 3011, Australia    or
> W3C, 2004 Route des Lucioles, 06902 Sophia Antipolis Cedex, France
>
>

Charles McCathieNevile  http://www.w3.org/People/Charles  tel: +61 409
134 136
SWAD-E http://www.w3.org/2001/sw/Europe         fax(france): +33 4 92 38
78 22
 Post:   21 Mitchell street, FOOTSCRAY Vic 3011, Australia    or
 W3C, 2004 Route des Lucioles, 06902 Sophia Antipolis Cedex, France
Received on Monday, 3 May 2004 11:54:19 UTC