Re: Another comment about confidence value. from Nils Ulltveit-Moe on 2005-04-19 (public-wai-ert@w3.org from April 2005)

From: Nils Ulltveit-Moe <nils@u-moe.no>
Date: Tue, 19 Apr 2005 23:46:57 +0200
To: Karl Dubost <karl@w3.org>
Cc: public-wai-ert@w3.org
Message-Id: <1113947217.3151.241.camel@moe-ulltveit-moe.com>
Hi Karl,

Confidence values or intervals is a difficult issue when dealing with
standardisation.

There  several reasons for this, like e.g. What does it mean that tool 1
has a pass with confidence level 0.8 for a particular test and tool 2
has a pass with confidence level 0.2 - does it mean that tool 1 is
better than tool 2? - Not neccessarily. If both tools are automatic
tools based on learning technology, then it may mean that tool 2 is not
trained to recognise this particular case, and needs further training.

It seems like there may be some practical problems with defining the
datatype needed (a constrained floating point value) due to status quo
of the RDF standardisation. However, I think these problems can be
overcome.

In our case, the confidence value is very useful in the decision process
for learning systems when they approach new values that they estimate
the test results for, since it may be used for targeted retraining of
the system. And we plan to use EARL between our own assessment tool
plugins and the RDF repository. These tools would be able to return the
confidence value, which could be stored and analysed in the RDF
repository.

There have been similar discussion in other standardisation groups, like
e.g. in the IDMEF group in IETF that defines a standardised XML format
for intrusion detection alerts. This thread is more or less a blueprint
of the discussion we are having now, including the discussion about the
scale of the confidence value:

http://www.izerv.net/idwg-public/archive/0408.html

Here is the conclusion in the IDMEF group:
http://www.izerv.net/idwg-public/archive/0439.html

" Confidence represents the analyzer's best estimate of the 
   validity of its analysis. This element is optional and should 
   be used only when the analyzer can produce meaningful information. 
   Systems that can output only a rough heuristic should use "low", 
   "medium", or "high" as the rating value. In this case, the 
   element content should be omitted. 


   Systems capable of producing reasonable probability estimates 
   should use "numeric" as the rating value and include a numeric 
   confidence value in the element content. This numeric value 
   should reflect a posterior probability (the probability that 
   an attack has occurred given the data seen by the detection 
   system and the model used by the system). It is a floating point 
   number between 0.0 and 1.0, inclusive. The number of digits 
   should be limited to those representable by a single precision 
   floating point value."

The IDMEF group ended up with defining the confidence attribute as an
optional value, which I believe is the most sensible thing to do also
for EARL, because assessment technologies based on learning technologies
will need to use the confidence value, and other tools may not decide to
use this. However, the confidence parameter should not be left out,
since it will be useful in our case, and probably also for a growing
number of assessment tools.

Regards,
Nils

tir, 19,.04.2005 kl. 13.47 -0400, skrev Karl Dubost:
> Hi,
> 
> Le 18 avr. 2005, à 10:33, Giorgio Brajnik a écrit :
> > I would suggest to consider confidence factors as probabilities 
> > associated to assertions (like "this test has failed").
> 
> I hope I have understood the main points of the discussion. I'm still 
> not sure yet. We have argued in QA against percentages for 
> *conformance*. We don't want that someone says for a technology. We 
> implemented 75% of the technology, because it doesn't make sense in a 
> conformance model and it doesn't mean anything about interoperability.
> 
> I understand the notion of level of confidence for a measure, but I 
> want to be sure we do not mix anything.
> 
> In scientific studies, a measure is always associated with an error 
> (calculated or estimated with different techniques). The collection of 
> x times the same measure helps to define, refine a level of confidence 
> for the results.
> 
> T being the temperature, Sundays in April 2005 at noon from an 
> imaginary location.
> 
> T(2005-04-03) = 20.0°C +/- 0.3°C
> T(2005-04-10) = 22.3°C +/- 0.2°C
> T(2005-04-17) = 23.4°C +/- 0.4°C
> 
> Each result is unique. The level of confidence is calculated when a  
> large collection of Sunday's temperatures has been acquired.
> 
> What is the temperature at noon on Sunday?
> 
> In an equatorial climate, the level of confidence will be good. In a 
> temperate climate, it will be very poor.
> 
> I see EARL giving the possibility to report the first series of 
> measures. The report for the level of confidence is then another level 
> of test and calculation somehow disconnected from the first series. So 
> I'm not sure the level of confidence is really part or EARL or more an 
> artefact of measurement.
> 
> How will we express it in EARL?
> 
> Am I off-track here?
> 
> 
-- 
Nils Ulltveit-Moe <nils@u-moe.no>
Received on Tuesday, 19 April 2005 21:42:39 UTC