summary of thread on confidence value from Shadi Abou-Zahra on 2005-04-20 (public-wai-ert@w3.org from April 2005)

From: Shadi Abou-Zahra <shadi@w3.org>
Date: Wed, 20 Apr 2005 17:47:48 +0200
To: <public-wai-ert@w3.org>
Message-ID: <00d001c545c0$4dc4aee0$0301a8c0@K2>

Hi,

First of all, I want to express my appreciation for the level of
discussion but also ask everybody to ensure a moderate tone in their
responses to others on the list.

Anyway, there seems to be a couple issues here, so please let me try to
summarize them:

* Confidence Levels as Precision Measurements
Some evaluations are not (yet) 100% machine computable but there are
heuristics that can help estimate an answer with a certain level of
confidence. For example, the "complexity" of a Web page according to the
number of links and other structural properties of the markup. It seems
that the confidence value here is a property of the test itself (for
example, the complexity algorithm has been benchmarked and is 83%
precise). The question is how to best represent this result in EARL. The
options on the table were:
 - as part of the result datatype itself
 - in an additional confidence interval

* Confidence Levels as Causal Indicators
Based upon inference rules (sometimes these are implicit and hardcoded
into the evaluation methodology), evaluation results can be derived. For
example, because an alt-text on the page (e.g. "image 1") matches a user
specified word-list, the test result is "fail". Yet this is a different
type of "fail" than images completely missing alt texts. These types of
causal results are difficult to benchmark statistically but there could
be factors that influence the confidence levels. For example, "failing
according to user defined word lists can not be assigned the confidence
value high". The question here is really how to define these factors.
There were no specific suggestions for these factors but most examples
in the thread used nominal confidence values for these types of issues.

For both the directions above, I'd like to remind people of the
earl:mode property which may be relevant here. We could define different
approaches for describing the confidence level based upon if the test
was conducted manually, automatically, or heuristically.

Looking forward to more discussion on this!

Regards,
  Shadi


---                                                    --- 
Shadi Abou-Zahra,    Chair and Team Contact for the ERT WG 
World Wide Web Consortium (W3C),        http://www.w3.org/ 
Web Accessibility Initiative (WAI), http://www.w3.org/WAI/ 
Evaluation and Repair Tools WG,  http://www.w3.org/WAI/ER/ 
2004, Route des Lucioles - 06560 Sophia-Antipolis - France 
Voice: +33(0)4 92 38 50 64        Fax: +33(0)4 92 38 78 22

Received on Wednesday, 20 April 2005 15:47:45 UTC