- From: Detlev Fischer <fischer@dias.de>
- Date: Mon, 14 Nov 2011 16:05:28 +0100
- To: public-wai-evaltf@w3.org
Hi all, The teleconference on 4 November has changed my perception of what our evaluation methodology is likely to accomplish. There seems to be consensus that the test procedure itself (stepping through the WCAG success criteria on the chosen level and checking web content against it) is not going to be covered in any detail. References to the quickref and the techniques provided under it will be used for that. Reason: The WCAG 2.0 techniques contain tests with pass/fail conditions in sufficient detail; this work should not be replicated or aggregated in order to avoid versioning / consistency issues and save maintenance effort. The methodology will instead focus on other aspects such as page sampling and setting the scope of conformance claims. It will also (probably?) propose a concept for tolerance metrics for deciding whether web content under test should pass or fail a success criterion. This would address questions like whether it should be possible for content to pass a SC even in the case of minor violations. Just as an example: Content may still pass SC 1.3.1 Info and relationships if a short list in some text content on a page is not properly marked up as list. The test would tolerate such minor violations. (Before you lunge at this: I am not suggesting it should; this is just to sketch a possible outcome of tolerance metrics.) Consequent discussions on the list indicated that the methodology might also deal with the issue of managing / rating multiple failures, i.e. web content that simultaneously fails several criteria. For me, one consequence of the consensus sketched is that I do no longer think it necessary to separate a document covering the evaluation procedure from a document covering the context (rationale, references, glossary, qualification of testers, etc). Why? In all likelihood, actual testers won't have our methodologgy on their lap. They will be using separate hands-on tools to guide their evaluation on the level of indiviadual success criteria, including the option to enter relevant comments about problems / violations. Such a hands-on tool could be a web application like BITV-Test (maybe suitably modified to address all three levels of WCAG), or it could be an Excel-based document like the access-for-all spreadsheet (Checkliste für barrierefreies Webdesign 2.0, http://url.ie/di0d ). The methodology, as I now see it, will act as a framework or spec for hands-on tools based on it. What will be absent though is any tangible advice regarding the actual rating of content under test. Hands-on tools may provide further help here which is no longer guided by our methodology, or only in broad terms. Now, lets look at the feasibility of tolerance metrics without the hands-on bit. Presumably, a case-by-case assessment of the severity of SC violations should enter any tolerance metrics before aggregating SC on the level of conformance claim. It now seems that our method will eschew altogether delving into success criteria and WCAG techniques and failures. By the same token, it cannot give any concrete guidance for assessing actual web content against success criteria. What's left of tolerance metrics? What can a generic section on tolerance metrics achieve? If it states, for example, that it is generally acceptable for content to have a share X of non-critical violations and still pass the SC, this leaves it to the tester to determine whether or not content in question qualifies as non-critical. Also, in many cases, a judgement of a quantitative share of violations against successful implementations is simply not possible. How can this issue be solved? If the outcome of recognising the difficulty of judging the criticality or degree of violations is that EVAL TF simply decides that *any* violation should fail a success criterion, we would end up with hardy any site passing. In my view, a methodology that avoids the procedure 'on the ground' trailing through SC and related failures and techniques, can certainly define some points that have not been sufficiently elaborated up to now (e.g. setting the scope of a claim, and sampling pages). I am resigned to think this can be useful. Such a methodology will not, however, solve the fundamental problem of assessing usually less-than-perfect web content: deciding on the criticality of violations and determining whether on the whole, a page should fail or pass, even with minor violations. Since that level of analysis is simply not covered, this means that it is up to testers to determine all that on their own. And all that will aggregate upwards. Goodbye, then, reliability. And say hello to your stern sister, replicability. Am 14.11.2011 09:19, schrieb Shadi Abou-Zahra: > Eval TF, > > Please find the minutes for the teleconference on 10 November 2011: > - <http://www.w3.org/2011/11/10-eval-minutes.html> > > Next meeting: Thursday 17 November 2011 > > > Regards, > Shadi > -- --------------------------------------------------------------- Detlev Fischer PhD DIAS GmbH - Daten, Informationssysteme und Analysen im Sozialen Geschäftsführung: Thomas Lilienthal, Michael Zapp Telefon: +49-40-43 18 75-25 Mobile: +49-157 7-170 73 84 Fax: +49-40-43 18 75-19 E-Mail: fischer@dias.de Anschrift: Schulterblatt 36, D-20357 Hamburg Amtsgericht Hamburg HRB 58 167 Geschäftsführer: Thomas Lilienthal, Michael Zapp ---------------------------------------------------------------
Received on Monday, 14 November 2011 15:06:14 UTC