Re: result type="foo", confidence, ... from Charles McCathieNevile on 2005-04-14 (public-wai-ert@w3.org from April 2005)

From: Charles McCathieNevile <charles@sidar.org>
Date: Thu, 14 Apr 2005 11:23:12 +1000
To: "Nils Ulltveit-Moe" <nils@u-moe.no>, "Paul Walsh" <paulwalsh@segalamtest.com>
Cc: shadi@w3.org, public-wai-ert@w3.org
Message-ID: <op.so67gyajw5l938@researchsft>

On Thu, 14 Apr 2005 06:14:11 +1000, Nils Ulltveit-Moe <nils@u-moe.no>  
wrote:

>
> Hi Paul,
>
> ons, 13,.04.2005 kl. 20.17 +0100, skrev Paul Walsh:
>> I'm not a statistician but isn't a 50% confidence level the same as
>> saying 'I don't know?! I know if someone said that to me, I would
>> assume they didn't know if a test case had passed or failed. This
>> would provide me with little confidence in their results.
>
> That is true, and is why I think the confidence interval may have its
> use. It is useful to know that the auditor was not sure about his
> decision. It is also useful to know if he was sure.

Well, the reason I have not been fond of the confidence hing is that if it  
is a simple number we don't have much idea about how to assess that  
compared to a simple number generated by someone else.

>> Furthermore, providing a varying degree of certainty is even more open
>> to interpretation - 20% certainty to one auditor could be 30% to
>> another.

This depends. Some things, tested on a statistical basis, can be expressed  
in an interoperable way.

Bayesian analysis of mail to determine whether it is spam, according to a  
known set of rules, readability analyses of various flavours, ...

> Varying interpretation between auditors on their confidence in different
> tests will be an error factor for small set of tests. However difference
> in the interpretation of the confidence value should even out over
> larger number of tests. I am viewing EARL from a large test set
> perspective.

I think that variance between auditors will often be high - many people  
often aren't consistent even in their own work at guessing how confident  
they are about something.

[...]
> The confidence interval is of course not needed for tests that can be
> decided with 100% certainty.
>
> It is only needed where the test cannot be determined exactly and some
> kind of judgmement is to be performed, either by manual assessment of
> some kind or by expert systems trained by humans.

And it is only useful if it can be recorded in a way that maintains  
interoperability. This is actually quite complex information. Since an  
elegant solution is as simple as possible and no more, I don't want to see  
us simplify to the point where we break the ability to do the powerful  
things that this can enable :-)

> Also, it would be a loss for other accessibility assessment tool vendors
> or users using our open source modules if we were able to provide the
> confidence interval, but EARL was not able to convey it in a
> standardised way.

Right.

cheers

Chaals

-- 
Charles McCathieNevile                      Fundacion Sidar
charles@sidar.org   +61 409 134 136    http://www.sidar.org

Received on Thursday, 14 April 2005 01:24:06 UTC