Re: Test Cases from Wendy A Chisholm on 2001-12-05 (w3c-wai-gl@w3.org from October to December 2001)

From: Wendy A Chisholm <wendy@w3.org>
Date: Tue, 04 Dec 2001 20:01:40 -0500
To: <GV@TRACE.WISC.EDU>
Cc: "'GLWAI Guidelines WG \(GL - WAI Guidelines WG\)'" <w3c-wai-gl@w3.org>
Message-Id: <5.1.0.14.2.20011204195636.023a20b0@localhost>
As we discussed at the September face to face, Josh Krieger and Chris Ridpath have a set of at least 268 test pages.  We discussed going through those and then creating more.
They are available for download in a zip file at: 
http://www.aprompt.ca/ATR/TestFilesHtml.zip

Also at that face to face, Charles Munat and I agreed to write a tool to help people go through the test files.  It was to serve 2 functions:
1. make the process easier for WCAG members to go through the test files
2. store results in EARL which would then help us create reports as well as provide a valuable implementation of EARL.

By saving the results of people's evaluations in a markup language, we can automatically generate reports to see where we agree and disagree.

Unfortunately for us (but happy for Charles) he has been busy with graduate work.  I've been busy with my regular workload.  Although, I am still very keen to see this through.  Charles McCathieNevile was also interested in this.  Since then, we have had some new faces join the ERT WG, so I might be able to get some help from them as well.  Although, as always, everyone is busy with their own things.

As always...volunteers will be appreciated.
--wendy

At 06:21 PM 12/4/01, Gregg Vanderheiden wrote:
>Absolutely.   Test cases (both selected and random) need to be a key
>part of our evaluation process.  In fact, procedure I think you are
>suggesting is just what has been discussed though not formalized. 
>
>So let's take this opportunity to begin that process.  
>
>
>Let me pose the following to begin discussion.
>
>
>1  -  create a collection of representative (as much as there is such a
>thing) pages or sites that sample the RANGE of different pages,
>approaches and technologies on the Web.
>2 - look at the items (particularly success criteria)  -  identify any
>additional sample pages or sites needed to explore the item (if sample
>is not good enough to)
>3 -  run quick tests by team members with these stimuli to see if
>agreement.  If team agrees that it fails, work on it.  If it passes team
>or is ambiguous then test move on to testing with external sample of
>people while fixing any problems identified in the internal screening
>test. 
>4 -  proceed in this manner to keep improving items and learning about
>objectivity or agreement as we move toward the final version and final
>testing.
>5 -  in parallel with the above, keep looking at the items with the
>knowledge we acquire and work to make items stronger
>
>
>The key to this is the Test Case Page Collection.  We have talked about
>this.  But no one has stepped forward to help build it.   Can we form a
>side team to work on this?
>
>
>
>NOTE: the above is a VERY rough description of a procedure as I run to a
>meeting.   But I would like to see if we can get this ball rolling.
>Comments and suggestions welcome.    
>
>Gregg
>
>-- ------------------------------ 
>Gregg C Vanderheiden Ph.D. 
>Professor - Human Factors 
>Dept of Ind. Engr. - U of Wis. 
>Director - Trace R & D Center 
>Gv@trace.wisc.edu <mailto:Gv@trace.wisc.edu>, <http://trace.wisc.edu/> 
>FAX 608/262-8848  
>For a list of our listserves send �lists� to listproc@trace.wisc.edu
><mailto:listproc@trace.wisc.edu> 
>
>
>-----Original Message-----
>From: Charles McCathieNevile [mailto:charles@w3.org] 
> Subject: Re: "objective" clarified
>
><snip>
>
>I think that for an initial assessment the threshold of 80% is fine, and
>I
>think that as we get closer to making this a final version we should be
>lifting that requirement to about 90 or 95%. However, I don't think that
>it
>is very useful to think about whether people would agree in the absence
>of
>test cases. There are some things where it is easy to describe the test
>in
>operational terms. There are others where it is difficult to descibe the
>test
>in operational terms, but it is easy to get substantial agreement. (The
>famous "I don't know how to define illustration, but I recognise it when
>I
>see it" explanation).
>
>It seems to me that the time spent in trying to imagine whether we would
>agree on a test would be more usefully spent in generating test cases,
>which
>we can thenuse to very quickly find out if we agree or not. The added
>value
>is that we then have those available as examples to show people - when
>it
>comes to people being knowledgeable of the tests and techniques they
>will
>have the head start of having seen real examples and what the working
>group
>thought about them as an extra guide.
>  
>
><snip> 

-- 
wendy a chisholm
world wide web consortium 
web accessibility initiative
seattle, wa usa
/--
Received on Tuesday, 4 December 2001 19:52:25 UTC