RE: Another comment about confidence value. from Paul Walsh on 2005-04-19 (public-wai-ert@w3.org from April 2005)

From: Paul Walsh <paul.walsh@segalamtest.com>
Date: Tue, 19 Apr 2005 19:16:19 +0100
To: "'Charles McCathieNevile'" <charles@sidar.org>, "'Nils Ulltveit-Moe'" <nils@u-moe.no>
Cc: "'Giorgio Brajnik'" <giorgio@dimi.uniud.it>, <public-wai-ert@w3.org>
Message-ID: <013e01c5450b$e29af4d0$0200a8c0@PaulLaptop>

You're correct, it's no clearer :)

You have provided examples of where I believe this process should be
used so we're in total agreement. Perhaps you can provide examples
surrounding web site accessibility?

Cheers
Paul

-----Original Message-----
From: Charles McCathieNevile [mailto:charles@sidar.org] 
Sent: 19 April 2005 18:54
To: Paul Walsh; 'Nils Ulltveit-Moe'
Cc: 'Giorgio Brajnik'; public-wai-ert@w3.org
Subject: Re: Another comment about confidence value.

On Tue, 19 Apr 2005 11:32:01 +0200, Paul Walsh  
<paul.walsh@segalamtest.com> wrote:

(I think this bit was Nils - CMN)
> I appreciate that. With such a profile your testers would most
probably
> be quite confident in their decisions, and if you are 100% confident
> that an accessibility issue is real, then the extra confidence value
is
> not needed. (i.e. the default value for confidence, if it is left out,
> is 1).

> [PW] Every 'validation' company needs to follow the same process
> irrespective of experience. That way, the output of the 'team' will be
> 100 confident in their interpretation of the checkpoint passing or
> failing. If they are not, then you have an issue with that company's
> capabilities and/or understanding of the checkpoints.

This is why I want to have a variety of confidence datatypes. In
principle  
you would have one per test process, but in practice there are going to
be  
lots of overlaps - for example if 100 different tests, run according to

Nils' process, give probability results accurate to 2 significant
figures,  
then it is probably OK to use the same datatype for all of them

On the other hand if I use a different process for a similar test, and
its  
results are different, I should use a different datatype. That way it is

possible to compare the results more accurately if I know more about the

differences in how the confidence is generated. The sort of examples
that  
spring to mind are to do with the accuracy of meters, or of labelling on

resistors, not WCAG conformance. For WCAG I think these comparisons, and

for that matter many confidence level sets, are going to be based on  
smaller sets - High medium low, integer from 1 to 7, etc.

I suspect I still haven't made this very clear. Any hints?

cheers

Chaals

-- 
Charles McCathieNevile                      Fundacion Sidar
charles@sidar.org   +61 409 134 136    http://www.sidar.org

Received on Tuesday, 19 April 2005 18:16:17 UTC