- From: Charles McCathieNevile <charles@w3.org>
- Date: Fri, 12 Sep 2003 12:43:44 -0400 (EDT)
- To: Patrick Curran <Patrick.Curran@Sun.COM>
- Cc: david_marston@us.ibm.com, www-qa@w3.org, w3c-wai-er-ig@w3.org
On Fri, 12 Sep 2003, Patrick Curran wrote: >david_marston@us.ibm.com wrote: >>From the point of view of QAWG guidelines, I see a problem in the >result table where you report that a product has passed 100% of the >test cases for each group. The string "No failures found" would be >more appropriate. The issues about percentages include: >1. Implication that the current suite is 100% of all the tests that > should be there. Test suites that are being expanded frequently > won't have a stable notion of 100%. > Patrick: >Isn't there a more fundamental problem? Test suites should be versioned. It >ought to be OK to state "I passed x% of the test cases for version y.z of >the test suite." The number of test cases in any particular version of the >test suite should be, of course, fixed. Well, there may be a case for making a statement like that, but in general david's following points (and your agreement) suggest to me that claiminig percentages is not often a good thing. EARL explicitly rejected the idea in discussions in the past, although we did discuss the idea of people subclassing earl:fail - a failure because some prerequisite requirement is failed isn't always the same thing as a total catastrophic failure, and this would be useful in certain domains. (Just as the OWL group seem to find duration of a test useful information in their domain). An RDF processor that can do every spec in the framework, and all the OWL stuff, except that it uses dc: and foaf: as built-in non-varying namespaces and doesn't check the xmlns declaration for them might pass almost all of the tests, but it does have a major flaw. Or is it a trivial quirk? The other notion implicit in the EARL design is that Test Suites are collections of tests, and that one test is "passing all of a Test Suite" - in other words they are quite likely to be nestable. Hence the heuristic mode for a test. Put another way, if test suite B is really test suite A plus tests b.1 b.2 b.3 and minus a.6, then passing A, b.1, b.2, b.3 probably means you pass B. Passing a except for a.6, and passing b.1, b.2, b.3 also means you pass B. (This is a simplification of the real-world situation with WCAG [1] and the US government's section 508 requirements for accessible web content) One of the requirements of EARL was to support gathering different test results from different sources and using them together (as Shadi said) to get some interesting information. In real world use for accessibility there are a number of implemented tools and there is active development. These overlaps get very complex, but having the raw information about which things were deemed to have been passed (and by whom, to provide for trust management) makes this possible - even if some results describe a suite of things that were passed, such as "double-A conformance to WCAG". Having a percentage of things passed (or more to the point failed) breaks this pretty comprehensively for many testing processes - although it is encouraging to generate the number and see it going up... cheers chaals [1] http://www.w3.org/TR/WCAG10 David: > 2. Implication that high numbers under 100% are pretty good. Each > class of product may have its own notion of how seriously interop > has been hurt by a score of, say, 96%. >3. Implication that all tests count equally. Product X's 96% might > be much worse than Product Y's 96%, depending on the cases that > comprise the failing 4% on each. >.................David Marston Patrick: >Right. Attempting to interpret any claim other than "I passed all the tests" >is a dangerous business. It can have value for the implementors, and help >them to figure out where they need to improve, but should not be used to >compare one implementation with another...
Received on Friday, 12 September 2003 12:43:44 UTC