- From: <fischer@dias.de>
- Date: Mon, 29 Aug 2011 22:19:32 +0200
- To: public-wai-evaltf@w3.org
Quoting "Boland Jr, Frederick E." <frederick.boland@nist.gov>: > A possible resource for use at some stage of our work - use of test > assertions (for example, as a technique for expressing any > requirements we develop as we were discussing at the last > teleconference) - (although it's primarily designed for > specification development, may have some applicability/usefulness to > us?) > > A resource link is: > http://www.oasis-open.org/committees/tc_home.php?wg_abbrev=tag#technical > > Thanks and best wishes > Tim Boland NIST Hi Tim, thanks for pointing to the the OASIS Test Assertions Guidelines Version 1.0 ( http://url.ie/cz6z ). I have tried to related the approach to the pragmatic context of accessibility evaluation of one of these sites out there. I think this is a good document when considering the usefulness and also the limitations of a formal test procedure linking the specification (in our case WCAG 2.0) to the test case (in our case, complex and often somewhat unruly and/or difficult-to-pin-down web pages) via some test assertion involving the target (page, or instance on the page), the prerequisite (prior conditions / dependencies that may apply to the test), the predicate describing the feature tested, and finally the outcome TRUE or FALSE depending on whether the Target is said to fulfil the Normative Statement addressed by the Test Assertion. Now, what will that mean when we test a page for a particular SC or a part thereof? In many cases, we have multiple instances on the page to be tested against the predicate. Take images, once more. Here, the tester has to assert image by image whether the target fulfils SC 1.1.1 or not ? for example, he or she will consider whether the alt text ?letterbox? is TRUE or FALSE for an icon linking to the contact form. This is the first problem. In our experience, no amount of training will lead to the exact same judgement here, especially if it has to be as coarse as TRUE or FALSE. Some will argue that ?letterbox? is not only correct in representing the object depicted, but also a well-known metaphor for contact, ?write to us?, and therefore fine. Others will insist on the correct identification of the function behind the icon. (An aside: Even with a prescription as clear as requiring an alt text for each image (bar decorative ones), we will have some people arguing that the SC is actually met because they have put a title on the image or the link around it, and modern screen readers will read that title in the absence of the alt text ? so where is the problem?) Scaling up to a result for all, say, 23 images on a page, the TRUE/FALSE dichotomy might carry over to a judgement whether the SC has been met or not met on the level of the page. At which point we have to acknowledge that some images are crucial for the use of a site while others will be of marginal importance. The test methodology would have to reflect that to be realistic: realistic with respect to the true impact of the success or failure of a particular instance to conform. A 1x1 px image of a site tracker without alt text at the end of a page may be a nuisance (after all, the URL may be read out by screen readers) but hardly a reason to fail the entire page if it is fine otherwise. If, on the other hand, all of a dozen teaser images have suitable alt text but one out of five items of the main menu composed of images of text has not, this is much more serious. The point is that a mere calculation of the number of instances of <img> on a page that fulfil the criterion against those that do not would not result in a meaningful rating for the page. A second point relates to the desire for the replicability of tests and the fear that subjective judgements would taint the results. The subjectivity can be shifted out of sight qua method but it will not be absent in many of the judgements that we will need to make when testing for WCAG conformance. The subjective judgement may be in deciding if an instance is critical or not (assuming that the failure of one critical instance would fail the entire page). Or it is in the instance-by-instance judgement deciding whether a less-than-perfect alt text (yes, the often are) is still TRUE or deserves a FALSE. Take another example: Headings. This is the vast territory of SC 1.3.1 Info and relationships, which contains many more things that will need to be tested separately. But let's focus just on headings, assuming there is a separate checkpoint for that. Looking at the techniques and failures for guidance, G142 just tests whether sections have headings at all; H42 tests whether things which look like a heading are actually marked up with <hn> elements; and G115 checks for the semantic function which implies hierarchies should be reflected in the heading structure. What is clear is that the judgement of headings (as part of SC 1.3.1) goes beyond the instance and makes only sense on the level of page and considering the context: nesting should be OK (but may have anything between very minor and grave flaws) and if you find endless swathes of text that will be a TRUE if it is Proust's In Search of Lost Time but likely a FALSE if it is an instruction manual or legal text. Again, my bet is that from tester to tester you will have variance in results especially if the SC should just be TRUE or FALSE, and I firmly believe that no amount of instruction and not the mightiest suite of example test cases showing the correct judgement can prevent that. Why? Because with every site you test things are new and different and nearly every time there are things where you wonder (and indeed have to discuss) whether they are acceptable or should lead to a less than TRUE judgement for a particular page and SC. If we go back to square one: what is the methodology going to give us? Is its aim just to mark a site as conforming to WCAG Level AA or not, based on documented test results and some cut-off point below which the result would then simply be ?not conforming?? I content that whatever shape the methodology takes, we have to accept that there is an unavoidable element of aggregated subjective judgement (I have pointed at a few cases) at the basis of the final verdict, even (and especially) when the testing aims to be rigorous. The more we would try to nail down or enlist all the myriad cases of what is TRUE and what is not, the more cumbersome the methodology gets, and we would still regularly encounter sites which do not map neatly onto any of the cases and therefore beg the question, require another reasoned human judgement. The alternative is to have a methodology that offers a differentiated appreciation of the degree of conformance of the site tested, often with a result of grey instead of black or white for a SC and a given page. On the top level, such a test can be aggregated in some ranking (points out of 100, percent compliance etc.) and on the detail level, it can expose all the problems testers have found in a manner that designers find suitable when reworking and improving the site. I do not need to tell you what kind of approach I favour... I guess the idea of a ?degree of compliance? is anathema to many people, especially engineers. I just think the precision some may hope to capture in a very formal methodology risks creating an artefact on top of the rather complex field of web design, with its many ways of meeting, not quite meeting, or failing WCAG success criteria.
Received on Monday, 29 August 2011 20:19:56 UTC