Re: EARL Guideline Pass/Fail Confidence from Charles McCathieNevile on 2004-01-28 (w3c-wai-er-ig@w3.org from January 2004)

From: Charles McCathieNevile <charles@w3.org>
Date: Wed, 28 Jan 2004 09:48:20 -0500 (EST)
To: Marja-Riitta Koivunen <marja@annotea.org>
Cc: Chris Ridpath <chris.ridpath@utoronto.ca>, WAI ER IG List <w3c-wai-er-ig@w3.org>
Message-ID: <Pine.LNX.4.55.0401280938140.12367@homer.w3.org>

On Wed, 28 Jan 2004, Marja-Riitta Koivunen wrote:

>
>If the high confidence tells that a checkpoint was tested both with
>automatic test tools as well as by a human I would put that information to
>the description of the test case.

>So we could have "partialtest1" and "partialtest2" for some automatic
>testing of checkpoint 1.1. In addition we could have a description of a
>combination test "trustedtest1" that refers to both automatic partial tests
>as well as the human test.

Right. In most cases you either know or you don't, for things like
accessibility.

There are cases where you might want to express confidence. For example, in
reviewing papers I have often used a system that asked me to select one of a
range of subclasses of Pass or Fail ("must have, strong accept, weak accept,
don't care" or "weak reject, strong reject") and to express a confidence
rating on my evaluation.  Similarly in testing things in the physical world
there is often a confidence (or tolerance) measure applied.

But I don't think these cases are the same as getting a tool to use
confidence because the developer thinks it is "pretty likely" that a certain
result indicates a problem, since that takes us down a path of interpretation
that I would prefer we leave to the reader of the results.

>Separately, we could express trust for these partial and combination tests
>maybe only in cases when it was done by a trusted group of persons and not
>too long ago etc. This trust varies according to the individuals.
>
>Some related things we have been experimenting with Annotea:
>Nobu did some trust experimentations with Annotea shared bookmarks, where
>trusted people's bookmarks were used to change the order of the search
>results (will link some slides from
>http://www.w3.org/2001/Annotea/User/Papers.html after I get OK from Nobu).

This work of Nobu's is very interesting. It is the reason why we ask for an
identifier for the assertor - and applying it to EARL is something that has
lots of obvious application. I hope permission to publish his stuff comes
quickly. (Plus, the slides are nice and clear).

>Also Dom did some experimentations with marking messages as spam with spam
>annotations http://www.w3.org/2003/Talks/2003/Spamdemo/Overview.html.
>Similarly EARL Assertions or specific tests could be annotated (or
>bookmarked) as trustworthy or unreliable.

I think the bookmark model (we are talking about the annotea RDF bookmark
schema work - http://www.w3.org/2003/07/Annotea/BookmarkSchema-20030707
explains more) is better than the annotea model, since different people are
going to have different trust profiles - although there is value in being
able to share these around (mediated through the same trust mechanism?)
a bit like GPG/PGP.

cheers

Chaals

Received on Wednesday, 28 January 2004 09:48:20 UTC