- From: Al Gilman <asgilman@iamdigex.net>
- Date: Thu, 07 Feb 2002 19:11:50 -0500
- To: Nick Kew <nick@webthing.com>, <w3c-wai-er-ig@w3.org>
[disclaimer -- this was written Tuesday. I held back from posting it because I had not addressed Nick's top concern, which was to assess the reasonableness of the confidences he had assigned. I still haven't done that. Neither have I digested the whole chat log. But in the chat log Wendy was questioning the need for 'confidence' beyond pass/fail, and I strongly support Nick's attempt to distinguish the relationship between the condition that the computer can evalutate and the checkpoint in statistical confidence terms, roughly speaking. So for better or worse here is a long brain dump on stream-of-unconsciousness form that Nick's post inspired. - Al] The basic principle is that the "overall evaluation" question needs to be answered by WCAG, not ER. A composite rollup combining priority and confidence is an application of the priorities, and comes within the scope of the WCAG in terms of "interpretations" of the guidelines. This does not mean that the WCAG are ready to give a quick answer. On the other hand, the WCAG would be considerably aided in their search for consensus on what in WCAG2 should fill the role of priorities in WCAG 1. Finding a way that this tool can be applied to the WCAG2 draft criteria, let WCAG look at sample reports, and react to that level of prototyping would allow WCAG as a body to understand the choices before them much better than is possible without the tool support. More inline below. At 08:49 PM 2002-02-05 , Nick Kew wrote: > >Page Valet now offers fairly comprehensive page evaluation against >the WCAG and US Section 508 accessibility guidelines. > >I'm now working through the issues of > >(1) distinguishing errors from warnings , and >(2) assigning an overall evaluation to a document > >To do so, I've established a set of confidence levels, and assigned >one to each test. This is in principle orthogonal to the WCAG >priorities, and should measure how likely Page Valet thinks it is >that a guideline has in fact been breached: > >e.g. - a Frame without a title is clearly a breach, so we can flag it > with high confidence. > - <strong>This text is emphasised</strong> might possibly be a > header, so we query whether it should be. But the chances are > it's being correctly used, so this is a low-confidence warning. > >I've now used five levels: > - Certain: we know this violates a guideline; no human check required. > - High: A construct that is likely to be wrong, but we're not certain. > - Medium: We can't tell; human checking required > - Low: Something that's probably OK, but should be flagged for checking. > - "-": Messages that definitely don't mean there's a problem. This last needs better definition. Why is there any event thrown? Commonly there is a category of loggable events which are not in and of themselves signs of anything wrong for certain but when there is something wrong they aid in the traceback. Sufficiently exceptional events to be notable to the notes. In case it becomes an issue. Major milestones in the success path are in this category, such as form submitted. > >In producing an overall document score, we simply evaluate the >highest confidence warning anywhere in the document: > > - Certain => Fail > - High => Probable Fail - check messages > - Medium => Uncertain - check messages carefully! > - Low => Probable Pass - check messages > - '-' => Pass - no problems found > It would be interesting to set quantitative targets for what the statistics would be in an ideal world for these grades. In monitoring semiconductor production at TI, they used to plot the following quantile points in lot parameters: 5%, 25%, 50%, 75%, 95%. They found this to be highly revealing and about the right amount of information to present so as not to lose important events in a cluttered display. The tradition in the social sciences is more like 1%, 5%, 50%, 95%, 99%. YMMV. But talking about the confidence levels inspires comparison with confidence assessment practices in statistics or we sorta need to get quantitative or get another word. >(unconditional pass is very hard indeed, but /WAI/ER/ scores it >at WCAG single-A :-) I beg your pardon? There is no purely machinable way to arrive at a WCAG 1.0 single-A assertion. > >Now, the Big Issue is assigning priorities. While the basic principle >is to describe confidences, that is inevitably often subjective, >and I'd really like some feedback on whether people agree with my >assignments. Get Jim Ley to build you a spider and gather some data. Actually, get Jim Ley to build WCAG a spider, and let WCAG do the 'truth' assessments that you compare with the 'prognostic' assesments that you can generate in an automated first pass. Then base the confidence on field demographics. Nobody can argue with a statement of the form "95% of the hit-weighted web content on the web that flunks this test flunks in-depth manual assessment by the experts of the WCAG WG. So please give it your careful attention." I should add that I have made some conscious decisions to >stray from the True Path of Confidence, in deference to real-world >considerations. For example, presentational HTML will generate >a message "Use CSS for layout and presentation" at WCAG-AA or higher. >(<http://www.w3.org/TR/WCAG10/#tech-style-sheets>http://www.w3.org/TR/WCAG 10/#tech-style-sheets), but the "border" >attribute is low-confidence (IMO it's not really harmful and it >does have legit. uses as a browser workaround) while other >presentational things will generate higher-confidence warnings. > As Kynn has pointed out, a candidate warning nominated by the detection of presentational attributes used in the HTML may be entirely pruned away by checking for the presence of the CSS to "do it right." That's a higer-level rollup that you can do off the logic of the checkpoint itself. The kinds of things that tool algorithms can do without treading on WCAG turf is prioritizing the display to the user of items that are equivalent in WCAG priority terms. And there is a lot to be done here. Heuristics to guess which of the IMG elements lacking ALT is likely the most egregious offendor, to convince the person receiving the report that this is a problem and they need to consider it. These heuristics are a subject for demographic research and the results would be useful for WCAG in terms of prioritizing their efforts. One example of what is wrong, presented in extenso using the detailed data (the actual image, e.g.) with a mockup of an authoring-tool-prompt-for-ALT (embedded in the surrounding text so that the flow throught the ALT text is graphically obvious) and then, at a hyperlink's remove, a list of similar violations. Then on to a qualitatively distinct category of flagged items. Design the report as an effective web page. Grouping these groups can perhaps be influenced by WCAG priority levels, but within groups you get to play games. The idea of an exhaustive report is exhausting. At least to my fevered brain at the moment, what we want to do is to _lead the user through a full repair cycle for one defect_ before moving on to others where they must make judgement calls. There is a cost-performance ratio to be considered in which errors to present first. The ones that are gimme's -- where the fix is easy -- may be what you want to do first. Where all they have to do is say "yes, change it to what you have suggested." Then gradually move up the cost scale to other items where they have to work more to make a repair, and down the benefit scale where the impacts are smaller and/or the evidence less clear. Al >Please folks, play with it, and let me know if you think my >confidence levels make sense! > >-- >Nick Kew > >Site Valet - the mark of Quality on the Web. ><URL:<http://valet.webthing.com/>http://valet.webthing.com/> >
Received on Thursday, 7 February 2002 19:12:59 UTC