RE: possible use of test assertions in defining/expressing requirements?

Hi Detlev and all EvalTF

Detlev, I really appreciated the work you went through to explain your point.  I couldn't agree more.  I think that any methodology will have this issue at its core.  When does a page pass or fail - based on a single severe instance of not meeting the criteria, lots of small instances, etc.  It can get really subjective on the part of the evaluator.  We also have to look at the alternative techniques the developers have used to address the SC and I think also look at the intent of the developers to meet the criteria.  I like the idea of a percentage of compliance for each principle or SC.  If you can tell a website owner that they have met 90% of a certain principle, he/she will be able to understand this and want to know where the other 10% have gone. IMHO


Regards

Vivienne L. Conway
________________________________________
From: public-wai-evaltf-request@w3.org [public-wai-evaltf-request@w3.org] On Behalf Of fischer@dias.de [fischer@dias.de]
Sent: Tuesday, 30 August 2011 4:19 AM
To: public-wai-evaltf@w3.org
Subject: Re: possible use of test assertions in defining/expressing   requirements?

Quoting "Boland Jr, Frederick E." <frederick.boland@nist.gov>:

> A possible resource for use at some stage of our work - use of test
> assertions (for example, as a technique for expressing any
> requirements we develop as we were discussing at the last
> teleconference) - (although it's primarily designed for
> specification development, may have some applicability/usefulness to
> us?)
>
> A resource link is:
> http://www.oasis-open.org/committees/tc_home.php?wg_abbrev=tag#technical
>
> Thanks and best wishes
> Tim Boland NIST

Hi Tim, thanks for pointing to the the OASIS Test Assertions
Guidelines Version 1.0 ( http://url.ie/cz6z ). I have tried to related
the approach to the pragmatic context of accessibility evaluation of
one of these sites out there.

I think this is a good document when considering the usefulness and
also the limitations of a formal test procedure linking the
specification (in our case WCAG 2.0) to the test case (in our case,
complex and often somewhat unruly and/or difficult-to-pin-down web
pages) via some test assertion involving the target (page, or instance
on the page), the prerequisite (prior conditions / dependencies that
may apply to the test), the predicate describing the feature tested,
and finally the outcome TRUE or FALSE depending on whether the Target
is said to fulfil the Normative Statement addressed by the Test
Assertion.

Now, what will that mean when we test a page for a particular SC or a
part thereof?

In many cases, we have multiple instances on the page to be tested
against the predicate. Take images, once more. Here, the tester has to
assert image by image whether the target fulfils SC 1.1.1 or not ? for
example, he or she will consider whether the alt text ?letterbox? is
TRUE or FALSE for an icon linking to the contact form. This is the
first problem. In our experience, no amount of training will lead to
the exact same judgement here, especially if it has to be as coarse as
TRUE or FALSE. Some will argue that ?letterbox? is not only correct in
representing the object depicted, but also a well-known metaphor for
contact, ?write to us?, and therefore fine. Others will insist on the
correct identification of the function behind the icon. (An aside:
Even with a prescription as clear as requiring an alt text for each
image (bar decorative ones), we will have some people arguing that the
SC is actually met because they have put a title on the image or the
link around it, and modern screen readers will read that title in the
absence of the alt text ? so where is the problem?)

Scaling up to a result for all, say, 23 images on a page, the
TRUE/FALSE dichotomy might carry over to a judgement whether the SC
has been met or not met on the level of the page. At which point we
have to acknowledge that some images are crucial for the use of a site
while others will be of marginal importance. The test methodology
would have to reflect that to be realistic: realistic with respect to
the true impact of the success or failure of a particular instance to
conform. A 1x1 px  image of a site tracker without alt text at the end
of a page may be a nuisance (after all, the URL may be read out by
screen readers) but hardly a reason to fail the entire page if it is
fine otherwise. If, on the other hand, all of a dozen teaser images
have suitable alt text but one out of five items of the main menu
composed of images of text has not, this is much more serious.

The point is that a mere calculation of the number of instances of
<img> on a page that fulfil the criterion against those that do not
would not result in a meaningful rating for the page.

A second point relates to the desire for the replicability of tests
and the fear that subjective judgements would taint the results. The
subjectivity can be shifted out of sight qua method but it will not be
absent in many of the judgements that we will need to make when
testing for WCAG conformance. The subjective judgement may be in
deciding if an instance is critical or not  (assuming that the failure
of one critical instance would fail the entire page). Or it is in the
instance-by-instance judgement deciding whether a less-than-perfect
alt text (yes, the often are) is still TRUE or deserves a FALSE.

Take another example: Headings. This is the vast territory of SC 1.3.1
Info and relationships, which contains many more things that will need
to be tested separately. But let's focus just on headings, assuming
there is a separate checkpoint for that. Looking at the techniques and
failures for guidance, G142 just tests whether sections have headings
at all;  H42 tests whether things which look like a heading are
actually marked up with <hn> elements; and G115 checks for the
semantic function which implies hierarchies should be reflected in the
heading structure.

What is clear is that the judgement of headings (as part of SC 1.3.1)
goes beyond the instance and makes only sense on the level of page and
considering the context: nesting should be OK (but may have anything
between very minor and grave flaws) and if you find endless swathes of
text that will be a TRUE if it is Proust's In Search of Lost Time but
likely a FALSE if it is an instruction manual or legal text. Again, my
bet is that from tester to tester you will have variance in results
especially if the SC should just be TRUE or FALSE, and I firmly
believe that no amount of instruction and not the mightiest suite of
example test cases showing the correct judgement can prevent that.
Why? Because with every site you test things are new and different and
nearly every time there are things where you wonder (and indeed have
to discuss) whether they are acceptable or should lead to a less than
TRUE judgement for a particular page and SC.

If we go back to square one: what is the methodology going to give us?
Is its aim just to mark a site as conforming to WCAG Level AA or not,
based on documented test results and some cut-off point below which
the result would then simply be ?not conforming??

I content that whatever shape the methodology takes, we have to accept
that there is an unavoidable element of aggregated subjective
judgement (I have pointed at a few cases) at the basis of the final
verdict, even (and especially) when the testing aims to be rigorous.
The more we would try to nail down or enlist all the myriad cases of
what is TRUE and what is not, the more cumbersome the methodology
gets, and we would still regularly encounter sites which do not map
neatly onto any of the cases and therefore beg the question, require
another reasoned human judgement.

The alternative is to have a methodology that offers a differentiated
appreciation of the degree of conformance of the site tested, often
with a result of grey instead of black or white for a SC and a given
page. On the top level, such a test can be aggregated in some ranking
(points out of 100, percent compliance etc.)  and on the detail level,
it can expose all the problems testers have found in a manner that
designers find suitable when reworking and improving the site. I do
not need to tell you what kind of approach I favour...

I guess the idea of a ?degree of compliance? is anathema to many
people, especially engineers. I just think the precision some may hope
to capture in a very formal methodology risks creating an artefact on
top of the rather complex field of web design, with its many ways of
meeting, not quite meeting, or failing WCAG success criteria.

This e-mail is confidential. If you are not the intended recipient you must not disclose or use the information contained within. If you have received it in error please return it to the sender via reply e-mail and delete any record of it from your system. The information contained within is not the opinion of Edith Cowan University in general and the University accepts no liability for the accuracy of the information provided.

CRICOS IPC 00279B

Received on Thursday, 1 September 2011 06:33:20 UTC