RE: EARL Guideline Pass/Fail Confidence from Shadi Abou-Zahra on 2004-01-28 (w3c-wai-er-ig@w3.org from January 2004)

From: Shadi Abou-Zahra <shadi@w3.org>
Date: Wed, 28 Jan 2004 13:27:38 +0100
To: "'Marja-Riitta Koivunen'" <marja@annotea.org>, "'Charles McCathieNevile'" <charles@w3.org>, "'Chris Ridpath'" <chris.ridpath@utoronto.ca>
Cc: "'WAI ER IG List'" <w3c-wai-er-ig@w3.org>
Message-ID: <005601c3e59a$1f001810$6a02010a@K2>
hi marja,

very interesting. i had similar thoughts but on trusting tools rather
than individuals. i thought, maybe the confidence level might be set to
be proportional to a benchmarking value of the tool against a test
suite. of course all that raises questions on how to construct the test
suite, how to benchmark and how to derive a confidence value from there
but that is off the point right now.

bottom line is that there might be some sort of algorithm to determine
the confidence level but essential values for this calculation would
probably need to come from an external source outside the tool. so who
finally sets the confidence level for an assertion?

regards,
  shadi


-----Original Message-----
From: w3c-wai-er-ig-request@w3.org [mailto:w3c-wai-er-ig-request@w3.org]
On Behalf Of Marja-Riitta Koivunen
Sent: Wednesday, January 28, 2004 13:00
To: Charles McCathieNevile; Chris Ridpath
Cc: WAI ER IG List
Subject: Re: EARL Guideline Pass/Fail Confidence



If the high confidence tells that a checkpoint was tested both with 
automatic test tools as well as by a human I would put that information
to 
the description of the test case.

So we could have "partialtest1" and "partialtest2" for some automatic 
testing of checkpoint 1.1. In addition we could have a description of a 
combination test "trustedtest1" that refers to both automatic partial
tests 
as well as the human test.

Separately, we could express trust for these partial and combination
tests 
maybe only in cases when it was done by a trusted group of persons and
not 
too long ago etc. This trust varies according to the individuals.

Some related things we have been experimenting with Annotea:
Nobu did some trust experimentations with Annotea shared bookmarks,
where 
trusted people's bookmarks were used to change the order of the search 
results (will link some slides from 
http://www.w3.org/2001/Annotea/User/Papers.html after I get OK from
Nobu).

Also Dom did some experimentations with marking messages as spam with
spam 
annotations http://www.w3.org/2003/Talks/2003/Spamdemo/Overview.html. 
Similarly EARL Assertions or specific tests could be annotated (or 
bookmarked) as trustworthy or unreliable.

Marja

At 07:33 AM 1/26/2004 -0500, Charles McCathieNevile wrote:

>Hi,
>
>I don't think confidence is a particularly accurate measure. In some
cases we
>are saying "cannotTell" but adding a suspicion, in some cases we are
almost
>certain (some people seem always to be certain :-).
>
>I have no objection to people using a confidence scale, but I suspect
that we
>should look at the use cases and whether we  can say something else
more
>useful.
>
>(See also my recent email about the bug in my intro re using
rdf:resource
>when it should be rdf:type or something even more complex...)
>
>Cheers
>
>Chaals
>
>On Thu, 15 Jan 2004, Chris Ridpath wrote:
>
> >
> >Charles has an example of EARL that shows how to express that a page
> >passes/fails an accessibility guideline. It's listed in his Coding
EARL (for
> >non experts) document at:
> >http://www.w3.org/2001/sw/Europe/talks/200311-earl/all.htm
> >
> >The EARL code looks like:
> >
> >  <earl:Assertion>
> >    <earl:subject rdf:resource="#http://www.w3.org/" />
> >    <earl:result
> >rdf:resource="http://www.w3.org/WAI/ER/EARL/nmg-strawman#Pass"/>
> >    <earl:testcase
> >rdf:resource="http://example.org/1999/xhtml#transitional"/>
> >    <earl:assertedBy rdf:resource="http://validator.w3.org" />
> >    <earl:mode
> >rdf:resource="http://www.w3.org/WAI/ER/EARL/nmg-strawman#automatic"/>
> >    <earl:message>This page is valid XHTML</earl:message>
> >  </earl:Assertion>
> >
> >Would this be a better assertion if there was an added 'confidence'
> >statement? Example:
> >
> ><earl:confidence
> >rdf:resource="http://www.w3.org/WAI/ER/EARL/nmg-strawman#high" />
> >or
> ><earl:confidence
rdf:resource=http://www.w3.org/WAI/ER/EARL/nmg-strawman#low
> >/>
> >
> >An automated checker tool can only detect some problems, not all.
It's up to
> >a person to determine if the page passes all accessibility checks.
For
> >example, only a person can determine if an image does/doesn't require
a long
> >description.
> >
> >If the EARL expressed that the guideline was passed with a 'high'
confidence
> >then it would mean that all accessibility checks had passed - machine
and
> >human. If the confidence was 'low' then it would mean that only
checks that
> >are machine testable had passed - one or more checks that require
human
> >intervention had not passed.
> >
> >Using the confidence statement an automated checking tool could tell
the
> >user that "likely the page will pass but you still need a human to
make some
> >accessibility checks".
> >
> >Chris
> >
> >
> >
>
>Charles McCathieNevile  http://www.w3.org/People/Charles  tel: +61 409
134 136
>SWAD-E http://www.w3.org/2001/sw/Europe         fax(france): +33 4 92
38 78 22
>  Post:   21 Mitchell street, FOOTSCRAY Vic 3011, Australia    or
>  W3C, 2004 Route des Lucioles, 06902 Sophia Antipolis Cedex, France
Received on Wednesday, 28 January 2004 07:27:47 UTC