Re: possible use of test assertions in defining/expressing requirements? from Denis Boudreau on 2011-09-08 (public-wai-evaltf@w3.org from September 2011)

From: Denis Boudreau <dboudreau@accessibiliteweb.com>
Date: Thu, 08 Sep 2011 09:48:23 -0400
To: Eval TF <public-wai-evaltf@w3.org>
Message-id: <233385F6-60EF-4367-A610-CD405755F8EE@accessibiliteweb.com>
Hello EvalTF folks,

Trying to catch up with the threads. I wanted to first jump in on the idea of "percentage of compliance" that was brought up a little while back.

We've used this idea of "percentage of compliance" since 2003. At first, we didn't want to go there because compliance is a very binary concept: you either comply or you don't. But it turned out that our clients needed to know/understand where they were positioning themselves in regards some sort of "accessibility goal". 

We quickly realized needed to come up with a way for them to understand how much they had accomplished and how much still needed to be done in order to reach "full compliance". Hence, a note from 1 to 100 seemed like the right thing to do, something everyone would appreciate and understand.

Like most of you I'm sure, we came up with a series of tests that were mapped to the different success criteria, grouped under different guidelines. As years went by and WCAG 2.0 became more and more likely to make it into a full blown recommendation, we ended up grouping all these tests under each SC, which in turn, were grouped under Guidelines, and then grouped under principles. We came up with a form of weighting that allowed us to determine (all too subjectively) which tests or SC were "more important" than others and brought the results over to 100 points.

We then decided that a website could be deemed "accessible enough" if each and every page audited got at least 90%. We made sure functional testing with screen readers was an integral part of this process and that this test was worth 10 points out of the 100. That way, it was guaranteed that in order to meet our qualification level, a website would prove to be a positive experience using various screen readers. This is still what we're doing to this day, with each relevant standard we audit on.

The problem with this method is that we're supporting the idea that compliance can be scaled while it just cannot. In reality, a really accessible website that would score 99% on our evaluation would be highly accessible, no doubt about that and most users would not experience any problem using it. However, if it's missing that 1%, it is not compliant with WCAG 2.0 as a whole even though it is compliant with probably all SC but one. 

We never found a way to address this problem and no matter what methodology we end up building or using, I think it would be great to make sure this group doesn't make the same "mistake" we did as it sends the wrong message out there.

Best,

-- 
Denis Boudreau, président
Coopérative AccessibilitéWeb 

1751 rue Richardson, bureau 6111 
Montréal (Qc), Canada H3K 1G6 
Téléphone : +1 877.315.5550 
Courriel : dboudreau@accessibiliteweb.com
Web : www.accessibiliteweb.com




On 2011-09-01, at 2:28 AM, Vivienne CONWAY wrote:

> Hi Detlev and all EvalTF
> 
> Detlev, I really appreciated the work you went through to explain your point.  I couldn't agree more.  I think that any methodology will have this issue at its core.  When does a page pass or fail - based on a single severe instance of not meeting the criteria, lots of small instances, etc.  It can get really subjective on the part of the evaluator.  We also have to look at the alternative techniques the developers have used to address the SC and I think also look at the intent of the developers to meet the criteria.  I like the idea of a percentage of compliance for each principle or SC.  If you can tell a website owner that they have met 90% of a certain principle, he/she will be able to understand this and want to know where the other 10% have gone. IMHO
> 
> 
> Regards
> 
> Vivienne L. Conway
> ________________________________________
> From: public-wai-evaltf-request@w3.org [public-wai-evaltf-request@w3.org] On Behalf Of fischer@dias.de [fischer@dias.de]
> Sent: Tuesday, 30 August 2011 4:19 AM
> To: public-wai-evaltf@w3.org
> Subject: Re: possible use of test assertions in defining/expressing   requirements?
> 
> Quoting "Boland Jr, Frederick E." <frederick.boland@nist.gov>:
> 
>> A possible resource for use at some stage of our work - use of test
>> assertions (for example, as a technique for expressing any
>> requirements we develop as we were discussing at the last
>> teleconference) - (although it's primarily designed for
>> specification development, may have some applicability/usefulness to
>> us?)
>> 
>> A resource link is:
>> http://www.oasis-open.org/committees/tc_home.php?wg_abbrev=tag#technical
>> 
>> Thanks and best wishes
>> Tim Boland NIST
> 
> Hi Tim, thanks for pointing to the the OASIS Test Assertions
> Guidelines Version 1.0 ( http://url.ie/cz6z ). I have tried to related
> the approach to the pragmatic context of accessibility evaluation of
> one of these sites out there.
> 
> I think this is a good document when considering the usefulness and
> also the limitations of a formal test procedure linking the
> specification (in our case WCAG 2.0) to the test case (in our case,
> complex and often somewhat unruly and/or difficult-to-pin-down web
> pages) via some test assertion involving the target (page, or instance
> on the page), the prerequisite (prior conditions / dependencies that
> may apply to the test), the predicate describing the feature tested,
> and finally the outcome TRUE or FALSE depending on whether the Target
> is said to fulfil the Normative Statement addressed by the Test
> Assertion.
> 
> Now, what will that mean when we test a page for a particular SC or a
> part thereof?
> 
> In many cases, we have multiple instances on the page to be tested
> against the predicate. Take images, once more. Here, the tester has to
> assert image by image whether the target fulfils SC 1.1.1 or not ? for
> example, he or she will consider whether the alt text ?letterbox? is
> TRUE or FALSE for an icon linking to the contact form. This is the
> first problem. In our experience, no amount of training will lead to
> the exact same judgement here, especially if it has to be as coarse as
> TRUE or FALSE. Some will argue that ?letterbox? is not only correct in
> representing the object depicted, but also a well-known metaphor for
> contact, ?write to us?, and therefore fine. Others will insist on the
> correct identification of the function behind the icon. (An aside:
> Even with a prescription as clear as requiring an alt text for each
> image (bar decorative ones), we will have some people arguing that the
> SC is actually met because they have put a title on the image or the
> link around it, and modern screen readers will read that title in the
> absence of the alt text ? so where is the problem?)
> 
> Scaling up to a result for all, say, 23 images on a page, the
> TRUE/FALSE dichotomy might carry over to a judgement whether the SC
> has been met or not met on the level of the page. At which point we
> have to acknowledge that some images are crucial for the use of a site
> while others will be of marginal importance. The test methodology
> would have to reflect that to be realistic: realistic with respect to
> the true impact of the success or failure of a particular instance to
> conform. A 1x1 px  image of a site tracker without alt text at the end
> of a page may be a nuisance (after all, the URL may be read out by
> screen readers) but hardly a reason to fail the entire page if it is
> fine otherwise. If, on the other hand, all of a dozen teaser images
> have suitable alt text but one out of five items of the main menu
> composed of images of text has not, this is much more serious.
> 
> The point is that a mere calculation of the number of instances of
> <img> on a page that fulfil the criterion against those that do not
> would not result in a meaningful rating for the page.
> 
> A second point relates to the desire for the replicability of tests
> and the fear that subjective judgements would taint the results. The
> subjectivity can be shifted out of sight qua method but it will not be
> absent in many of the judgements that we will need to make when
> testing for WCAG conformance. The subjective judgement may be in
> deciding if an instance is critical or not  (assuming that the failure
> of one critical instance would fail the entire page). Or it is in the
> instance-by-instance judgement deciding whether a less-than-perfect
> alt text (yes, the often are) is still TRUE or deserves a FALSE.
> 
> Take another example: Headings. This is the vast territory of SC 1.3.1
> Info and relationships, which contains many more things that will need
> to be tested separately. But let's focus just on headings, assuming
> there is a separate checkpoint for that. Looking at the techniques and
> failures for guidance, G142 just tests whether sections have headings
> at all;  H42 tests whether things which look like a heading are
> actually marked up with <hn> elements; and G115 checks for the
> semantic function which implies hierarchies should be reflected in the
> heading structure.
> 
> What is clear is that the judgement of headings (as part of SC 1.3.1)
> goes beyond the instance and makes only sense on the level of page and
> considering the context: nesting should be OK (but may have anything
> between very minor and grave flaws) and if you find endless swathes of
> text that will be a TRUE if it is Proust's In Search of Lost Time but
> likely a FALSE if it is an instruction manual or legal text. Again, my
> bet is that from tester to tester you will have variance in results
> especially if the SC should just be TRUE or FALSE, and I firmly
> believe that no amount of instruction and not the mightiest suite of
> example test cases showing the correct judgement can prevent that.
> Why? Because with every site you test things are new and different and
> nearly every time there are things where you wonder (and indeed have
> to discuss) whether they are acceptable or should lead to a less than
> TRUE judgement for a particular page and SC.
> 
> If we go back to square one: what is the methodology going to give us?
> Is its aim just to mark a site as conforming to WCAG Level AA or not,
> based on documented test results and some cut-off point below which
> the result would then simply be ?not conforming??
> 
> I content that whatever shape the methodology takes, we have to accept
> that there is an unavoidable element of aggregated subjective
> judgement (I have pointed at a few cases) at the basis of the final
> verdict, even (and especially) when the testing aims to be rigorous.
> The more we would try to nail down or enlist all the myriad cases of
> what is TRUE and what is not, the more cumbersome the methodology
> gets, and we would still regularly encounter sites which do not map
> neatly onto any of the cases and therefore beg the question, require
> another reasoned human judgement.
> 
> The alternative is to have a methodology that offers a differentiated
> appreciation of the degree of conformance of the site tested, often
> with a result of grey instead of black or white for a SC and a given
> page. On the top level, such a test can be aggregated in some ranking
> (points out of 100, percent compliance etc.)  and on the detail level,
> it can expose all the problems testers have found in a manner that
> designers find suitable when reworking and improving the site. I do
> not need to tell you what kind of approach I favour...
> 
> I guess the idea of a ?degree of compliance? is anathema to many
> people, especially engineers. I just think the precision some may hope
> to capture in a very formal methodology risks creating an artefact on
> top of the rather complex field of web design, with its many ways of
> meeting, not quite meeting, or failing WCAG success criteria.
> 
> This e-mail is confidential. If you are not the intended recipient you must not disclose or use the information contained within. If you have received it in error please return it to the sender via reply e-mail and delete any record of it from your system. The information contained within is not the opinion of Edith Cowan University in general and the University accepts no liability for the accuracy of the information provided.
> 
> CRICOS IPC 00279B
>
Received on Thursday, 8 September 2011 13:48:48 UTC