- From: Nils Ulltveit-Moe <nils@u-moe.no>
- Date: Tue, 19 Apr 2005 23:46:57 +0200
- To: Karl Dubost <karl@w3.org>
- Cc: public-wai-ert@w3.org
Hi Karl, Confidence values or intervals is a difficult issue when dealing with standardisation. There several reasons for this, like e.g. What does it mean that tool 1 has a pass with confidence level 0.8 for a particular test and tool 2 has a pass with confidence level 0.2 - does it mean that tool 1 is better than tool 2? - Not neccessarily. If both tools are automatic tools based on learning technology, then it may mean that tool 2 is not trained to recognise this particular case, and needs further training. It seems like there may be some practical problems with defining the datatype needed (a constrained floating point value) due to status quo of the RDF standardisation. However, I think these problems can be overcome. In our case, the confidence value is very useful in the decision process for learning systems when they approach new values that they estimate the test results for, since it may be used for targeted retraining of the system. And we plan to use EARL between our own assessment tool plugins and the RDF repository. These tools would be able to return the confidence value, which could be stored and analysed in the RDF repository. There have been similar discussion in other standardisation groups, like e.g. in the IDMEF group in IETF that defines a standardised XML format for intrusion detection alerts. This thread is more or less a blueprint of the discussion we are having now, including the discussion about the scale of the confidence value: http://www.izerv.net/idwg-public/archive/0408.html Here is the conclusion in the IDMEF group: http://www.izerv.net/idwg-public/archive/0439.html " Confidence represents the analyzer's best estimate of the validity of its analysis. This element is optional and should be used only when the analyzer can produce meaningful information. Systems that can output only a rough heuristic should use "low", "medium", or "high" as the rating value. In this case, the element content should be omitted. Systems capable of producing reasonable probability estimates should use "numeric" as the rating value and include a numeric confidence value in the element content. This numeric value should reflect a posterior probability (the probability that an attack has occurred given the data seen by the detection system and the model used by the system). It is a floating point number between 0.0 and 1.0, inclusive. The number of digits should be limited to those representable by a single precision floating point value." The IDMEF group ended up with defining the confidence attribute as an optional value, which I believe is the most sensible thing to do also for EARL, because assessment technologies based on learning technologies will need to use the confidence value, and other tools may not decide to use this. However, the confidence parameter should not be left out, since it will be useful in our case, and probably also for a growing number of assessment tools. Regards, Nils tir, 19,.04.2005 kl. 13.47 -0400, skrev Karl Dubost: > Hi, > > Le 18 avr. 2005, à 10:33, Giorgio Brajnik a écrit : > > I would suggest to consider confidence factors as probabilities > > associated to assertions (like "this test has failed"). > > I hope I have understood the main points of the discussion. I'm still > not sure yet. We have argued in QA against percentages for > *conformance*. We don't want that someone says for a technology. We > implemented 75% of the technology, because it doesn't make sense in a > conformance model and it doesn't mean anything about interoperability. > > I understand the notion of level of confidence for a measure, but I > want to be sure we do not mix anything. > > In scientific studies, a measure is always associated with an error > (calculated or estimated with different techniques). The collection of > x times the same measure helps to define, refine a level of confidence > for the results. > > T being the temperature, Sundays in April 2005 at noon from an > imaginary location. > > T(2005-04-03) = 20.0°C +/- 0.3°C > T(2005-04-10) = 22.3°C +/- 0.2°C > T(2005-04-17) = 23.4°C +/- 0.4°C > > Each result is unique. The level of confidence is calculated when a > large collection of Sunday's temperatures has been acquired. > > What is the temperature at noon on Sunday? > > In an equatorial climate, the level of confidence will be good. In a > temperate climate, it will be very poor. > > I see EARL giving the possibility to report the first series of > measures. The report for the level of confidence is then another level > of test and calculation somehow disconnected from the first series. So > I'm not sure the level of confidence is really part or EARL or more an > artefact of measurement. > > How will we express it in EARL? > > Am I off-track here? > > -- Nils Ulltveit-Moe <nils@u-moe.no>
Received on Tuesday, 19 April 2005 21:42:39 UTC