RE: EARL Guideline Pass/Fail Confidence

At 01:27 PM 1/28/2004 +0100, Shadi Abou-Zahra wrote:

>hi marja,
>
>very interesting. i had similar thoughts but on trusting tools rather
>than individuals. i thought, maybe the confidence level might be set to
>be proportional to a benchmarking value of the tool against a test
>suite. of course all that raises questions on how to construct the test
>suite, how to benchmark and how to derive a confidence value from there
>but that is off the point right now.
>
>bottom line is that there might be some sort of algorithm to determine
>the confidence level but essential values for this calculation would
>probably need to come from an external source outside the tool. so who
>finally sets the confidence level for an assertion?

I don't think the question is so much who sets it, but having the essential 
information available for setting it.

My take is that there can be many communities for setting the confidence 
level and it may vary e.g. my personal confidence can be different from the 
official company confidence that maybe requires more trusted human 
evaluators. Furthermore, the government in a certain country may have their 
own way of setting the trust level e.g. maybe demanding at least 3 trusted 
evaluators to agree on the result before trusting it.

It would be also good if you could compare your own results to some more 
trusted sources and learn what things you may be missing.

Marja


>regards,
>   shadi
>
>
>-----Original Message-----
>From: w3c-wai-er-ig-request@w3.org [mailto:w3c-wai-er-ig-request@w3.org]
>On Behalf Of Marja-Riitta Koivunen
>Sent: Wednesday, January 28, 2004 13:00
>To: Charles McCathieNevile; Chris Ridpath
>Cc: WAI ER IG List
>Subject: Re: EARL Guideline Pass/Fail Confidence
>
>
>
>If the high confidence tells that a checkpoint was tested both with
>automatic test tools as well as by a human I would put that information
>to
>the description of the test case.
>
>So we could have "partialtest1" and "partialtest2" for some automatic
>testing of checkpoint 1.1. In addition we could have a description of a
>combination test "trustedtest1" that refers to both automatic partial
>tests
>as well as the human test.
>
>Separately, we could express trust for these partial and combination
>tests
>maybe only in cases when it was done by a trusted group of persons and
>not
>too long ago etc. This trust varies according to the individuals.
>
>Some related things we have been experimenting with Annotea:
>Nobu did some trust experimentations with Annotea shared bookmarks,
>where
>trusted people's bookmarks were used to change the order of the search
>results (will link some slides from
>http://www.w3.org/2001/Annotea/User/Papers.html after I get OK from
>Nobu).
>
>Also Dom did some experimentations with marking messages as spam with
>spam
>annotations http://www.w3.org/2003/Talks/2003/Spamdemo/Overview.html.
>Similarly EARL Assertions or specific tests could be annotated (or
>bookmarked) as trustworthy or unreliable.
>
>Marja
>
>At 07:33 AM 1/26/2004 -0500, Charles McCathieNevile wrote:
>
> >Hi,
> >
> >I don't think confidence is a particularly accurate measure. In some
>cases we
> >are saying "cannotTell" but adding a suspicion, in some cases we are
>almost
> >certain (some people seem always to be certain :-).
> >
> >I have no objection to people using a confidence scale, but I suspect
>that we
> >should look at the use cases and whether we  can say something else
>more
> >useful.
> >
> >(See also my recent email about the bug in my intro re using
>rdf:resource
> >when it should be rdf:type or something even more complex...)
> >
> >Cheers
> >
> >Chaals
> >
> >On Thu, 15 Jan 2004, Chris Ridpath wrote:
> >
> > >
> > >Charles has an example of EARL that shows how to express that a page
> > >passes/fails an accessibility guideline. It's listed in his Coding
>EARL (for
> > >non experts) document at:
> > >http://www.w3.org/2001/sw/Europe/talks/200311-earl/all.htm
> > >
> > >The EARL code looks like:
> > >
> > >  <earl:Assertion>
> > >    <earl:subject rdf:resource="#http://www.w3.org/" />
> > >    <earl:result
> > >rdf:resource="http://www.w3.org/WAI/ER/EARL/nmg-strawman#Pass"/>
> > >    <earl:testcase
> > >rdf:resource="http://example.org/1999/xhtml#transitional"/>
> > >    <earl:assertedBy rdf:resource="http://validator.w3.org" />
> > >    <earl:mode
> > >rdf:resource="http://www.w3.org/WAI/ER/EARL/nmg-strawman#automatic"/>
> > >    <earl:message>This page is valid XHTML</earl:message>
> > >  </earl:Assertion>
> > >
> > >Would this be a better assertion if there was an added 'confidence'
> > >statement? Example:
> > >
> > ><earl:confidence
> > >rdf:resource="http://www.w3.org/WAI/ER/EARL/nmg-strawman#high" />
> > >or
> > ><earl:confidence
>rdf:resource=http://www.w3.org/WAI/ER/EARL/nmg-strawman#low
> > >/>
> > >
> > >An automated checker tool can only detect some problems, not all.
>It's up to
> > >a person to determine if the page passes all accessibility checks.
>For
> > >example, only a person can determine if an image does/doesn't require
>a long
> > >description.
> > >
> > >If the EARL expressed that the guideline was passed with a 'high'
>confidence
> > >then it would mean that all accessibility checks had passed - machine
>and
> > >human. If the confidence was 'low' then it would mean that only
>checks that
> > >are machine testable had passed - one or more checks that require
>human
> > >intervention had not passed.
> > >
> > >Using the confidence statement an automated checking tool could tell
>the
> > >user that "likely the page will pass but you still need a human to
>make some
> > >accessibility checks".
> > >
> > >Chris
> > >
> > >
> > >
> >
> >Charles McCathieNevile  http://www.w3.org/People/Charles  tel: +61 409
>134 136
> >SWAD-E http://www.w3.org/2001/sw/Europe         fax(france): +33 4 92
>38 78 22
> >  Post:   21 Mitchell street, FOOTSCRAY Vic 3011, Australia    or
> >  W3C, 2004 Route des Lucioles, 06902 Sophia Antipolis Cedex, France

Received on Wednesday, 28 January 2004 08:41:56 UTC