Re: More on result type="foo", confidence, ... from Nils Ulltveit-Moe on 2005-04-14 (public-wai-ert@w3.org from April 2005)

From: Nils Ulltveit-Moe <nils@u-moe.no>
Date: Thu, 14 Apr 2005 11:41:12 +0200
To: Charles McCathieNevile <charles@sidar.org>
Cc: shadi@w3.org, public-wai-ert@w3.org
Message-Id: <1113471672.6924.38.camel@moe-ulltveit-moe.com>
Hi all,

I am not sure I want the confidence paramenter to be named nils ;-)

I will try to correct myself by making the terms more precise.

What we are talking about is not the confidence interval, but a
confidence value: 

A confidence interval shows the confidence that a value falls between
two points. 

A confidence value shows how confident we are that this value is
correct.

For fuzzy logic, confidence values indicate degree of membership rather
than a probability. (I.e. to what degree the logic interprets that the
value belongs to this "bag")

For An heuristic assessment would be able to say that the answer is
Pass with 80% confidence, Fail with 10% confidence and N/A with 10%
confidence. Most tools would then answer Pass to this test. It might be
useful to know that the confidence was 80%.

The problem may occur when the tool is not able to decide. What EARL
code do we return if the algorithm returns Pass with 33% confidence,
Fail with 33% confidence and N/A with 33% confidence?

Knowing the confidence value of 33%, we would not give much weight to
this measurement. 

A question in this case, is if the tool should have the possibility to
report a fourth value in this case (I.e. "#DontKnow") or if this should
be indicated with the tool returning "#ManualInspectionNeeded" or
nothing, indicating implicitly by returning nothing for this test case?
This would typically have to be done if the confidence value is not
used.

There exists some similar work on confidence level or confidence
interval for conveying channel information for mobile applications:

http://www.research.att.com/areas/wireless/Perf_Enhance_Tech/WCI/JKim_WCI_ICC2001.pdf

Modelled from this one could maybe have something like:

<earl:accurancy unit='percent' confidence='0.9'/>

Mvh.
Nils

tor, 14,.04.2005 kl. 10.54 +1000, skrev Charles McCathieNevile:
> On Thu, 14 Apr 2005 05:09:16 +1000, Nils Ulltveit-Moe <nils@u-moe.no>  
> wrote:
> 
> 
> > Most humans are not able to specify exactly how confident they are that
> > an accessibility claim is indeed a real problem for accessibility. They
> > may be able to give a rough indication, though.
> [...]
> > What I am trying to say, is that it is not always possible to abstract
> > oneself away from uncertainties that are inherent in the problem or
> > testcase that is being investigating, and in such cases it is better to
> > have a model that includes the uncertainty than pretending that the
> > uncertainty is not there.
> 
> I agree that it is hard to specify exactly, and this is why I have not  
> been keen on it. Iam slowly coming around to the idea, if it can be  
> modelled in such a way that interoperability isn't more or less impossible.
> 
> >> From what I have discussed here, I am actually getting more convinced
> > that a confidence interval or similar is useful, and not only for
> > automatic tests. If confidence interval was used also for manual tests
> > then one would be able to get feedback on how real people perceived
> > different accessibility problems to be, which in turn could be used by
> > W3C to improve the WCAG checklists.
> [...]
> > My conclusion is that maybe the confidence interval should not be
> > mandatory, but I think it should be optional in EARL. And it should be
> > modelled as a probability; i.e. an integer between 0 and 1.
> 
> I think if we are going to have it, then it should be an n-ary object that  
> actually includes somespecification of how it was measured. For example,  
> spamassassin point scores can be used interoperably between mail clients  
> (perhaps this is a good use case).
> 
> This would leave us with, in the simple case, modelling the result as a  
> blank node with a type (rather than using it as a direct property of the  
> Assertion), and in the complex case adding a compound description of the  
> confidence we have in the result.
> 
> (We could do both, but that means that it we have to require systems to  
> support two different RDF graphs for a result - more complexity than we  
> need, I think, since we should be able to constrain the graph easily  
> enough).
> 
> So you stay with, in the simple case
> 
> <earl:Assertion>
>    <earl:result r:type="...fail"/>
> 
> (instead of <earl:result r:resource="...fail"/> )
> 
> and if you want to do confidence you have something like the example in  
> the spec (although I suggest we change the model for that to make it more  
> or less require to say what scheme you use - even if it is just to  
> identify that this is "Chaals' gut feeling scheme" :-)
> 
> cheers
> 
> Chaals
-- 
Nils Ulltveit-Moe <nils@u-moe.no>
Received on Thursday, 14 April 2005 09:37:15 UTC