- From: Charles McCathieNevile <charles@w3.org>
- Date: Mon, 1 Jul 2002 21:51:28 -0400 (EDT)
- To: Ian Hickson <ian@hixie.ch>
- cc: w3c-wai-er-ig <w3c-wai-er-ig@w3.org>
On Mon, 1 Jul 2002, Ian Hickson wrote: Charles McCathieNevile wrote: > Hmm. This was similar to somethingthe UAAG group wanted. > > I would propose that we have qualitative rather than quantitative ratings. Why? I personally would much rather both "confidence" and "severity" were changed to use percentages (or some other scale, e.g. 0.0 .. 1.0). Having an enumerated set of values artificially limits applications. For example, I need four severities (100%, 90%, 50%, 0%) and two confidence levels (100%, 0%), whereas the current spec only allows for 1 severity (100%) and three confidence levels (~100%, ~66%, ~33%). Using a numeric scale doesn't reduce interoperability, as applications can simply "pigeon hole" values into their internal enumerated types. (The problem of round tripping through systems that use enumerated sets is already present, since it is most likely that applications will not have exactly matching sets of severities and confidences.) For example, I intend to map any Pass values with severity 80%-99% into the "pass with unrelated errors (Yb)" category when summarising results. (Note: Internally, I store severities and confidences as integers from 0 to 255. That is the natural computer equivalent of percentages. I would be quite happy if EARL used the integer scale 0..255. I would also be fine with EARL using a floating point scale between two arbitrary values, such as 0.0 and 1.0.) CMN Exactly my point. You will have an internal mapping of numeric results taht is specific to your needs, and the next person will too, but theirs will be slightly different. If you map both of those to the same EARL scheme by numeric value, you lose the ability to compare between them without doing complicated searching based on who said what and then doing the mapping - if you have them as seperate namespaces attached to the comments then you only need to do the namepsace mapping itself, and you don't get essentially meaningless EARL data like "this is 66% conformant to a checkpoint that may or may not ahve a definition of what that means" (your own test setup should have a definition of numeric values if it is going to generate them...) > For some use cases your Yb - pass with unrelated errors would count as a pass, > and for some cases it would score as a fail. So we would need to know what > they are. Assuming "they" refers to the "unrelated errors" then yes, you would; that's what the "comments" field is for, presumably. CMN Sorry, my bad writing. What I meant was that we need to know what the use case is, and what the errors are, to categorise whether a result means "only failed because of dependencies" or "passed the essential bit". But maybe this isn't necessary if we just say "use a more atomic test". I don't know if that works in the real world or not. > The question also arises as to how many kinds of result we should include in > earl and at what point we should leave people to subclass them for their own > more detailed uses. I think you only need: Pass Fail Not Applicable Not Tested All the other values I can think of are simply variants of those four result types with various values for "Severity" and "Confidence". You don't need "Can't Tell" as that should just be "Pass" or "Fail" with "Confidence: 0%". (Whether you pick "pass" or "fail" depends on which is the "default" -- e.g. in a test where supporting the feature correctly and not even trying to support the feature are indistinguishable, you would use "Pass", while in a test where trying to support the feature but doing so _incorrectly_ is indistinguishable from not supporting the feature at all you would use "Fail".) CMN Not sure that I agree here. Having a pass and a fail be equivalent under circumstances that are outside the RDF model we are using seems likely to lead to problems in processing. If there is a cannotTell then I think it should be a single defined property. More to the point, what is the use case for it in the first place? Note that "Not Tested" is present only for completeness, as I expect most applications would simply not include the result in that case. CMN agree. At some point when we have some tools we should look to see if there is a use case that justifies it - otherwise we might as well remove it as dead weight. "Not Applicable" is important for tests that neither pass nor fail, such as a test for making sure all images have alternate text, when applied to a document with no images, or a test to make sure that 'red' is rendered different from 'green', on a monochromatic device. (CMN I agree with your conclusion, but testing whether 'green' and 'red' are rendered differently on a monochromatic device is in fact important...) -- Charles McCathieNevile http://www.w3.org/People/Charles phone: +61 409 134 136 W3C Web Accessibility Initiative http://www.w3.org/WAI fax: +33 4 92 38 78 22 Location: 21 Mitchell street FOOTSCRAY Vic 3011, Australia (or W3C INRIA, Route des Lucioles, BP 93, 06902 Sophia Antipolis Cedex, France)
Received on Monday, 1 July 2002 21:51:30 UTC