Test Cases

Absolutely.   Test cases (both selected and random) need to be a key
part of our evaluation process.  In fact, procedure I think you are
suggesting is just what has been discussed though not formalized. 

So let's take this opportunity to begin that process.  


Let me pose the following to begin discussion.


1  -  create a collection of representative (as much as there is such a
thing) pages or sites that sample the RANGE of different pages,
approaches and technologies on the Web.
2 - look at the items (particularly success criteria)  -  identify any
additional sample pages or sites needed to explore the item (if sample
is not good enough to)
3 -  run quick tests by team members with these stimuli to see if
agreement.  If team agrees that it fails, work on it.  If it passes team
or is ambiguous then test move on to testing with external sample of
people while fixing any problems identified in the internal screening
test. 
4 -  proceed in this manner to keep improving items and learning about
objectivity or agreement as we move toward the final version and final
testing.
5 -  in parallel with the above, keep looking at the items with the
knowledge we acquire and work to make items stronger


The key to this is the Test Case Page Collection.  We have talked about
this.  But no one has stepped forward to help build it.   Can we form a
side team to work on this?



NOTE: the above is a VERY rough description of a procedure as I run to a
meeting.   But I would like to see if we can get this ball rolling.
Comments and suggestions welcome.    

Gregg

-- ------------------------------ 
Gregg C Vanderheiden Ph.D. 
Professor - Human Factors 
Dept of Ind. Engr. - U of Wis. 
Director - Trace R & D Center 
Gv@trace.wisc.edu <mailto:Gv@trace.wisc.edu>, <http://trace.wisc.edu/> 
FAX 608/262-8848  
For a list of our listserves send “lists” to listproc@trace.wisc.edu
<mailto:listproc@trace.wisc.edu> 


-----Original Message-----
From: Charles McCathieNevile [mailto:charles@w3.org] 
 Subject: Re: "objective" clarified

<snip>

I think that for an initial assessment the threshold of 80% is fine, and
I
think that as we get closer to making this a final version we should be
lifting that requirement to about 90 or 95%. However, I don't think that
it
is very useful to think about whether people would agree in the absence
of
test cases. There are some things where it is easy to describe the test
in
operational terms. There are others where it is difficult to descibe the
test
in operational terms, but it is easy to get substantial agreement. (The
famous "I don't know how to define illustration, but I recognise it when
I
see it" explanation).

It seems to me that the time spent in trying to imagine whether we would
agree on a test would be more usefully spent in generating test cases,
which
we can thenuse to very quickly find out if we agree or not. The added
value
is that we then have those available as examples to show people - when
it
comes to people being knowledgeable of the tests and techniques they
will
have the head start of having seen real examples and what the working
group
thought about them as an extra guide.
  

<snip>

Received on Tuesday, 4 December 2001 18:28:48 UTC