Re: possible use of test assertions in defining/expressing requirements?

Quoting "Boland Jr, Frederick E." <frederick.boland@nist.gov>:

> A possible resource for use at some stage of our work - use of test  
> assertions (for example, as a technique for expressing any
> requirements we develop as we were discussing at the last  
> teleconference) - (although it's primarily designed for  
> specification development, may have some applicability/usefulness to  
> us?)
>
> A resource link is:
> http://www.oasis-open.org/committees/tc_home.php?wg_abbrev=tag#technical
>
> Thanks and best wishes
> Tim Boland NIST

Hi Tim, thanks for pointing to the the OASIS Test Assertions  
Guidelines Version 1.0 ( http://url.ie/cz6z ). I have tried to related  
the approach to the pragmatic context of accessibility evaluation of  
one of these sites out there.

I think this is a good document when considering the usefulness and  
also the limitations of a formal test procedure linking the  
specification (in our case WCAG 2.0) to the test case (in our case,  
complex and often somewhat unruly and/or difficult-to-pin-down web  
pages) via some test assertion involving the target (page, or instance  
on the page), the prerequisite (prior conditions / dependencies that  
may apply to the test), the predicate describing the feature tested,  
and finally the outcome TRUE or FALSE depending on whether the Target  
is said to fulfil the Normative Statement addressed by the Test  
Assertion.

Now, what will that mean when we test a page for a particular SC or a  
part thereof?

In many cases, we have multiple instances on the page to be tested  
against the predicate. Take images, once more. Here, the tester has to  
assert image by image whether the target fulfils SC 1.1.1 or not ? for  
example, he or she will consider whether the alt text ?letterbox? is  
TRUE or FALSE for an icon linking to the contact form. This is the  
first problem. In our experience, no amount of training will lead to  
the exact same judgement here, especially if it has to be as coarse as  
TRUE or FALSE. Some will argue that ?letterbox? is not only correct in  
representing the object depicted, but also a well-known metaphor for  
contact, ?write to us?, and therefore fine. Others will insist on the  
correct identification of the function behind the icon. (An aside:  
Even with a prescription as clear as requiring an alt text for each  
image (bar decorative ones), we will have some people arguing that the  
SC is actually met because they have put a title on the image or the  
link around it, and modern screen readers will read that title in the  
absence of the alt text ? so where is the problem?)

Scaling up to a result for all, say, 23 images on a page, the  
TRUE/FALSE dichotomy might carry over to a judgement whether the SC  
has been met or not met on the level of the page. At which point we  
have to acknowledge that some images are crucial for the use of a site  
while others will be of marginal importance. The test methodology  
would have to reflect that to be realistic: realistic with respect to  
the true impact of the success or failure of a particular instance to  
conform. A 1x1 px  image of a site tracker without alt text at the end  
of a page may be a nuisance (after all, the URL may be read out by  
screen readers) but hardly a reason to fail the entire page if it is  
fine otherwise. If, on the other hand, all of a dozen teaser images  
have suitable alt text but one out of five items of the main menu  
composed of images of text has not, this is much more serious.

The point is that a mere calculation of the number of instances of  
<img> on a page that fulfil the criterion against those that do not  
would not result in a meaningful rating for the page.

A second point relates to the desire for the replicability of tests  
and the fear that subjective judgements would taint the results. The  
subjectivity can be shifted out of sight qua method but it will not be  
absent in many of the judgements that we will need to make when  
testing for WCAG conformance. The subjective judgement may be in  
deciding if an instance is critical or not  (assuming that the failure  
of one critical instance would fail the entire page). Or it is in the  
instance-by-instance judgement deciding whether a less-than-perfect  
alt text (yes, the often are) is still TRUE or deserves a FALSE.

Take another example: Headings. This is the vast territory of SC 1.3.1  
Info and relationships, which contains many more things that will need  
to be tested separately. But let's focus just on headings, assuming  
there is a separate checkpoint for that. Looking at the techniques and  
failures for guidance, G142 just tests whether sections have headings  
at all;  H42 tests whether things which look like a heading are  
actually marked up with <hn> elements; and G115 checks for the  
semantic function which implies hierarchies should be reflected in the  
heading structure.

What is clear is that the judgement of headings (as part of SC 1.3.1)  
goes beyond the instance and makes only sense on the level of page and  
considering the context: nesting should be OK (but may have anything  
between very minor and grave flaws) and if you find endless swathes of  
text that will be a TRUE if it is Proust's In Search of Lost Time but  
likely a FALSE if it is an instruction manual or legal text. Again, my  
bet is that from tester to tester you will have variance in results  
especially if the SC should just be TRUE or FALSE, and I firmly  
believe that no amount of instruction and not the mightiest suite of  
example test cases showing the correct judgement can prevent that.  
Why? Because with every site you test things are new and different and  
nearly every time there are things where you wonder (and indeed have  
to discuss) whether they are acceptable or should lead to a less than  
TRUE judgement for a particular page and SC.

If we go back to square one: what is the methodology going to give us?  
Is its aim just to mark a site as conforming to WCAG Level AA or not,  
based on documented test results and some cut-off point below which  
the result would then simply be ?not conforming??

I content that whatever shape the methodology takes, we have to accept  
that there is an unavoidable element of aggregated subjective  
judgement (I have pointed at a few cases) at the basis of the final  
verdict, even (and especially) when the testing aims to be rigorous.  
The more we would try to nail down or enlist all the myriad cases of  
what is TRUE and what is not, the more cumbersome the methodology  
gets, and we would still regularly encounter sites which do not map  
neatly onto any of the cases and therefore beg the question, require  
another reasoned human judgement.

The alternative is to have a methodology that offers a differentiated  
appreciation of the degree of conformance of the site tested, often  
with a result of grey instead of black or white for a SC and a given  
page. On the top level, such a test can be aggregated in some ranking  
(points out of 100, percent compliance etc.)  and on the detail level,  
it can expose all the problems testers have found in a manner that  
designers find suitable when reworking and improving the site. I do  
not need to tell you what kind of approach I favour...

I guess the idea of a ?degree of compliance? is anathema to many  
people, especially engineers. I just think the precision some may hope  
to capture in a very formal methodology risks creating an artefact on  
top of the rather complex field of web design, with its many ways of  
meeting, not quite meeting, or failing WCAG success criteria.

Received on Monday, 29 August 2011 20:19:56 UTC