RE: result type="foo", confidence, ... from Paul Walsh on 2005-04-13 (public-wai-ert@w3.org from April 2005)

From: Paul Walsh <paulwalsh@segalamtest.com>
Date: Wed, 13 Apr 2005 20:17:51 +0100
To: "'Nils Ulltveit-Moe'" <nils@u-moe.no>, <shadi@w3.org>, <public-wai-ert@w3.org>
Cc: "'Charles McCathieNevile'" <charles@sidar.org>, <public-wai-ert@w3.org>
Message-ID: <01b901c5405d$7c838f20$0200a8c0@PaulLaptop>
Hi,
 
I'm not a statistician but isn't a 50% confidence level the same as
saying 'I don't know?! I know if someone said that to me, I would assume
they didn't know if a test case had passed or failed. This would provide
me with little confidence in their results.
 
Furthermore, providing a varying degree of certainty is even more open
to interpretation - 20% certainty to one auditor could be 30% to
another. I think this approach is destined to attract even more
ambiguity which is what we are trying to get away from. 
 
Kind regards,
Paul
 
-----Original Message-----
From: public-wai-ert-request@w3.org
[mailto:public-wai-ert-request@w3.org] On Behalf Of Nils Ulltveit-Moe
Sent: 13 April 2005 20:09
To: shadi@w3.org; public-wai-ert@w3.org
Cc: 'Charles McCathieNevile'; public-wai-ert@w3.org
Subject: Re: result type="foo", confidence, ...
 
 
Hi Shadi,
 
I agree that a confidence interval is probably not interesting reading
for the most end users. However, it may be interesting for
statisticians, to know how well you can trust the result of the test. It
is always possible to choose between three values - Pass, Fail or
notApplicable, however how confident you are that a Yes is indeed a Yes
may vary. 
 
Most humans are not able to specify exactly how confident they are that
an accessibility claim is indeed a real problem for accessibility. They
may be able to give a rough indication, though. For accessibility claims
that are weak one person would say that this test is Pass with 50%
confidence, and one would say Fail with 50% confidence. Both may be
right from their point of view, and there is no correct answer. Adding
the confidence in this case indicates to statisticians that this value
should not be weighted as much as an indicator that the assertor
strongly believes is an accessibility issue. Heuristic algorithms would
work in a similar way. 
 
What I am trying to say, is that it is not always possible to abstract
oneself away from uncertainties that are inherent in the problem or
testcase that is being investigating, and in such cases it is better to
have a model that includes the uncertainty than pretending that the
uncertainty is not there.
 
>From what I have discussed here, I am actually getting more convinced
that a confidence interval or similar is useful, and not only for
automatic tests. If confidence interval was used also for manual tests
then one would be able to get feedback on how real people perceived
different accessibility problems to be, which in turn could be used by
W3C to improve the WCAG checklists.
 
I am not so afraid of EARL becoming a more complex protocol. After all,
it is intended to be machine readable and not directly consumed by
humans. If different degrees of complexity in the protocol are to be
allowed, that should be indicated in whether parameters are mandatory or
not. RDF is a gracious protocol to work with in that sense, because EARL
producing tools must implement the mandatory part of EARL, and may
implement optional features, if applicable. It is easy for EARL
consuming tools to ignore the parameters they do not understand, because
all RDF aware tools are able to traverse the RDF graph, and pick the
parts they understand.
 
The protocol should be designed to be sufficiently complex, but not
bloated. That may also be helpful in finding extended use for the
protocol in other areas than accessibility testing.
 
My conclusion is that maybe the confidence interval should not be
mandatory, but I think it should be optional in EARL. And it should be
modelled as a probability; i.e. an integer between 0 and 1.
 
Mvh.
Nils Ulltveit-Moe
 
ons, 13,.04.2005 kl. 17.36 +0200, skrev Shadi Abou-Zahra:
> Hi,
> 
> Frankly, I see havoc and confusion upon thy users. :)
> 
> We are talking about cascades of test cases, assertors, and possibly
subjects too. Complex but useful. However, several results? Which one
should a tool that is processing tools pick?
> 
> It seems to me that it may be a better approach to rework the model
for deriving/communicating the confidence level and keep one unambiguous
result per assertion.
> 
> Cheers,
>   Shadi
> 
> 
> -----Original Message-----
> From: public-wai-ert-request@w3.org On Behalf Of Charles
McCathieNevile
> Sent: Wednesday, April 13, 2005 16:34
> To: public-wai-ert@w3.org
> Subject: result type="foo", confidence, ...
> 
> 
> 
> Hi folks,
> 
> in the current EARL spec there are results which look like the
following:
> 
> <earl:result rdf:parseType="Resource">
>    <earl:validity rdf:resource="&earl;fail"/>
>    <earl:confidence rdf:resource="&earl;high"/>
>    <earl:message>malformed element in line 23</earl:message>
> </earl:result>
> 
> This makes it possible to put two result on the same Assertion - for  
> example to assert that they have a different probability, or the
assertor  
> has a different level of conidence in them.
> 
> <earl:result rdf:parseType="Resource">
>    <earl:validity rdf:resource="&earl;notApplicable"/>
>    <earl:confidence rdf:resource="&earl;low"/>
>    <earl:message>malformed element in line 23</earl:message>
> </earl:result>
> 
> I am not sure if we want to maintain this possibility, but it provides
a  
> feasible explanation of what I was copying when I wrote up my examples
for  
> "EARL by example" [1], and it is how Hera currently produces EARL.
> 
> Any thoughts?
> 
> cheers
> 
> Chaals
> 
> [1] http://www.w3.org/2001/sw/Europe/talks/200311-earl/all
> 
-- 
Nils Ulltveit-Moe <nils@u-moe.no>
Attachments

image/gif attachment: image001.gif
Received on Wednesday, 13 April 2005 19:17:56 UTC