Re: Confidences in accessibilty evaluation from Shadi Abou-Zahra on 2005-07-18 (public-wai-ert@w3.org from July 2005)

From: Shadi Abou-Zahra <shadi@w3.org>
Date: Mon, 18 Jul 2005 13:38:50 +0200
To: Nils Ulltveit-Moe <nils@u-moe.no>
Cc: public-wai-ert@w3.org
Message-ID: <42DB94CA.6000508@w3.org>
Hi,

Nils Ulltveit-Moe wrote:
> The question is then how the confidence properties will be used? If they
> are used as guidelines for expert evaluators on which parts of the
> results from an automatic tool that needs further manual examination, as
> is the case for conformance assessments, then different levels may make
> sense. 

Yes, this is one of the use cases. Another one would be for the developers to prioritize the results after an evaluation of the Web site has been carried out (i.e. let's fix all the obvious violations first, then start digging into more questionable ones etc).


> However, if the objective is some kind of automatic aggregation to
> identify some kind of accessibility indicator, then different levels
> will introduce more noise into the aggregated results, and is in general
> not desireable.

True, noise will be an issue we would have to deal with. There may be aggregations of atomic tests on the guideline level. There is also a second axis of aggregation which is over whole resources such as Web sites, pages, or components (i.e. does this table violate the guidelines?). Despite noise and precision issues, the question is, will the confidence values be repeatable and interoperable between different vendors and evaluators? (currently they aren't)


Regards,
  Shadi



>>>man, 18,.07.2005 kl. 11.07 +0200, skrev Shadi Abou-Zahra:
>>>
>>>
>>>>Hi,
>>>>
>>>>Giorgio Brajnik wrote:
>>>>
>>>>
>>>>>On 7/13/05, Nick Kew <nick@webthing.com> wrote:
>>>>
>>>>[SNIP]
>>>>
>>>>
>>>>>>Were you on the call where we discussed this?  I think we were in 
>>>>>>favour of allowing each tool its own choice over how to express 
>>>>>>confidence values.
>>>>
>>>>[SNIP]
>>>>
>>>>
>>>>>In the next months if I can I will do some work on this, which of
>>>>>course will be shared with this group.
>>>>
>>>>We have talked about providing a more open mechanism for tools to provide their own methods of expressing confidence (especially for specialized domains or tools) but we also identified a need to roll out with a "built-in" default method that is unambiguous. As a first attempt, we agreed to look into the current High/Medium/Low values and see if we can define a robust model so that different tools would generate similar results.
>>>>
>>>>It seems to me that working with probabilities (or numbers in general) is a practical approach from the sense of aggregations and calculation. Also Nils has been previously posting some interesting work on this to the group. However, the major draw back is that this approach ties heavily with the test definitions or needs considerable benchmarking work to deliver the base values (probabilities).
>>>>
>>>>One approach may be to help develop the WCAG 2.0 test suites and then use them for benchmarking purposes. Tool developers could run their tools on such test suites and identify probability values on *checkpoint* level. Of course, developers could also have their own test suites that are public or proprietary if they (and their customers) prefer.
>>>>
>>>>Do you think that this would eventually work?
>>>>
>>>>Regards,
>>>> Shadi
>>>>
>>>>
>>

-- 
Shadi Abou-Zahra,     Web Accessibility Specialist for Europe 
Chair and Team Contact for the Evaluation and Repair Tools WG 
World Wide Web Consortium (W3C),           http://www.w3.org/ 
Web Accessibility Initiative (WAI),    http://www.w3.org/WAI/ 
WAI-TIES Project,                 http://www.w3.org/WAI/TIES/ 
Evaluation and Repair Tools WG,     http://www.w3.org/WAI/ER/ 
2004, Route des Lucioles -- 06560, Sophia-Antipolis -- France 
Voice: +33(0)4 92 38 50 64           Fax: +33(0)4 92 38 78 22
Received on Monday, 18 July 2005 11:38:50 UTC