RE: result type="foo", confidence, ...

Hi Paul,

ons, 13,.04.2005 kl. 20.17 +0100, skrev Paul Walsh:
> I'm not a statistician but isn't a 50% confidence level the same as
> saying 'I don't know?! I know if someone said that to me, I would
> assume they didn't know if a test case had passed or failed. This
> would provide me with little confidence in their results.

That is true, and is why I think the confidence interval may have its
use. It is useful to know that the auditor was not sure about his
decision. It is also useful to know if he was sure.

> Furthermore, providing a varying degree of certainty is even more open
> to interpretation - 20% certainty to one auditor could be 30% to
> another. 

Varying interpretation between auditors on their confidence in different
tests will be an error factor for small set of tests. However difference
in the interpretation of the confidence value should even out over
larger number of tests. I am viewing EARL from a large test set

> I think this approach is destined to attract even more ambiguity which
> is what we are trying to get away from. 

We may be able to work around a missing conficence interval in EARL to
some extent for large scale assessments, since the confidence of the
measurements to some extent may be measured indirectly using e.g.
comparative analysis of tools. However this is more course-grained than
providing the confidence interval directly where it can be provided.

Our plan to include confidence interval, where it is possible to do so,
because we regard it as useful information. Not having access to the
confidence interval where it can be provided (e.g. for heuristic
methods) will make it more difficult to draw conclusions on the
statistic data collected, since we do not know anything about the
confidence or variance in confidence, which may indicate how well a
heuristic (or manual) test works. 

The confidence interval is of course not needed for tests that can be
decided with 100% certainty. 

It is only needed where the test cannot be determined exactly and some
kind of judgmement is to be performed, either by manual assessment of
some kind or by expert systems trained by humans.

Also, it would be a loss for other accessibility assessment tool vendors
or users using our open source modules if we were able to provide the
confidence interval, but EARL was not able to convey it in a
standardised way.

Nils Ulltveit-Moe <>

Received on Wednesday, 13 April 2005 20:10:15 UTC