- From: Nick Kew <nick@webthing.com>
- Date: Tue, 12 Jul 2005 17:08:08 +0100
- To: "public-wai-ert@w3.org" <public-wai-ert@w3.org>
Confidence Values in Accessibility Evaluation Some weeks ago I took an action to write a note about confidence values, noting that it was going to be a while before I'd have time to write anything. As I recollect it, the gist of the discussion was how confidences (High/Medium/Low) relate to outcomes (Pass/Fail/CannotTell), and why a low-confidence fail needs to be distinct from a pass at any confidence level. The crucial point here is that accessibility analysis happens on different levels. Different levels of analysis call for different vocabularies, with of course a common core. Firstly, to deal with the confidences themselves. They are largely arbitrary, but are designed to deal with differences in the likelihood of an individual test indicating a violation of the guidelines. Note that this is totally orthogonal to the importance (A/AA/AAA) of any given violation. To take a few examples, going from high to low confidence: * An IMG with no ALT is a violation. There is no doubt, and the tool can say so with certainty. An IMG with ALT=SPACER is almost certainly a violation, but might be correct in exceptional cases. * A BLOCKQUOTE may or may not be a violation. Since BLOCKQUOTE is widely abused for indentation, any particular use of the element is at quite a high risk of being an abuse, unless the tool can infer otherwise. If it has a cite=... attribute then the tool can indeed infer that the usage is correct. If it doesn't, the tool should ask the evaluator to verify it. * A STYLE=... attribute to any element might possibly be used to convey vital information that becomes inaccessible without it. But this is rare in practice. If the tool cannot tell whether a style is safe, it should flag up a note just to alert the evaluator. Now when evaluating a page, a tool may apply thousands of such tests, and the tool developer's most difficult task is to find a middle way between omitting important detail and overwhelming the user with mostly-irrelevant detail (valet has been criticised simultaneously for offering too much or too little detail, and deals with the problem primarily by offering the user a choice of presentations to meet differing needs and expectations). Given thousands of individual tests, most of which a page passes, it is certainly not helpful to present the user with every irrelevant result. Instead of recording every test passed, it should refer the user to general documentation, which will point out for example that the tool tests all images for ALT attributes. If no warnings were generated you can infer that all images have them. That means that *every* test result within a detailed page report is a Fail or CannotTell. If the tests themselves are implemented as binary pass/fail, then it is always a Fail of the tool's test that we report. The tool may designate some tests as certain and others as uncertain, but that's a crude distinction and barely more helpful than old-fashioned Bobby's manual checks - which consistently get ignored - or indeed the ultimate reduction of including the entire guidelines as manual checks in every report! This is where confidence levels are helpful. Though arbitrary and imperfect, they provide a much finer and more useful distinction than simply a page full of "cannottell" results. Every confidence reported represents the tool's confidence that the test failed. They may also be used in compiling whole-page results from the individual warnings. A page containing deprecated markup is an automatic fail at WCAG-AA or higher, whereas a valid/strict page containing lower-confidence warnings gets flagged up as CannotTell - or some variant on that. A page that has been verified by the evaluator ticking every automatic warning as "condition satisfied" or "not applicable" - or indeed a page that generates no warnings whatsoever - is flagged as a Pass. This whole-page level, and upwards to Sites and Applications, calls for the Pass/Fail/CannotTell vocabulary that is irrelevant within a detailed page analysis. -- Nick Kew
Received on Tuesday, 12 July 2005 16:21:52 UTC