RE: some comments/questions on techniques instructions document for submitters from Léonie Watson on 2011-08-21 (public-wai-evaltf@w3.org from August 2011)

From: Léonie Watson <lwatson@nomensa.com>
Date: Sun, 21 Aug 2011 19:25:33 +0100
To: Denis Boudreau <dboudreau@accessibiliteweb.com>, Eval TF <public-wai-evaltf@w3.org>
CC: WCAG WG <w3c-wai-gl@w3.org>
Message-ID: <D4219A0ECCAE794C9ED7DC6F5A4C0CD537B3731AC3@jupiter.intranet.nomensa.com>
Denis Boudreau wrote:
"While I agree with Detlev on some level, I do not believe we can be thorough and confident to cover all the related techniques and failures associated to a specific success criterion while auditing if we do not go down to that atomic level. "

	Unless an assessment goes down to that atomic level, I believe it's vulnerable to inconsistency. High level evaluations have a wide margin for interpretation. The methodology must be possible to (easily) apply consistently.

Denis Boudreau wrote:
"It may look like a lot of tests at first, but it turns out that it's not so bad because we never audit every page on a website, but rather pick a set of representative pages based on various templates. So in the worst cases, we rarely end up with more than 12 pages to audit."

	We do something very similar. We manually evaluate the representative sample of pages using the atomic tests, then run higher level automated tests across the whole of the website (or at least a thousand pages). We then follow up with a more heuristic evaluation using different access technologies.

Denis Boudreau wrote:
"For us at least, web accessibility auditing is always at least a two-phase process: a first assessment of what's out there and another one, after the recommendations have been put in place, to see how well the developers did. And so we also came to realize that the best way to ensure people would fix all pages and not only the ones that were audited was simply to retain, on the second round of evaluation, about 60% of the pages that were first audited and then go pick a few new ones just to see if they measure up with the ones that were fixed."

	Again, this is remarkably similar to our process. We don't usually take the approach of selecting a few new pages when it comes to the retest, but it's a brilliant idea!

Denis Boudreau wrote:
"All in all, we usually plan about two hours per page audited, screen reader testing included. We feel an audit cannot be considered complete without combining both those checklists and user testing. So a 10 page evaluation would require anywhere between say, 15 to 20 hours of work per round. I'm very curious/interested to compare these numbers with what you folks currently do."

	We tend to work in days rather than hours, but our estimates come  out about the same in the end I think. The biggest challenge for us is writing up the results into a meaningful report. Finding the balance between informative and information overload is often quite troublesome.

Léonie.

 

-----Original Message-----
From: public-wai-evaltf-request@w3.org [mailto:public-wai-evaltf-request@w3.org] On Behalf Of Denis Boudreau
Sent: 20 August 2011 15:02
To: Eval TF
Cc: WCAG WG
Subject: Re: some comments/questions on techniques instructions document for submitters

Good morning everyone,

I guess this is a good opportunity to dive right in the EVAL TF work and share a bit of our experience with methodology.

While I agree with Detlev on some level, I do not believe we can be thorough and confident to cover all the related techniques and failures associated to a specific success criterion while auditing if we do not go down to that atomic level. Grouping different elements together to limit the number of tests will make it easy on the auditor, no doubt about that, but in my humble opinion, would naturally lead the forgetting things along the way. 

The example if SC 1.1.1 is great because of the quantity of elements to look for, and so would be 1.3.1. When there are so many things to look out for, it's easy to either forget one or feel overwhelmed by the quantity. But on the other hand, this is just the reality of accessibility testing.

When we do WCAG 2.0 assessment work at the office, we go over a series of 170 atomic tests for all 61 SC, divided like so:

* 105 tests for WCAG 2.0 A
* 27 tests for WCAG 2.0 AA (for a total of 132 tests for lvl A and AA)
* 38 tests for WCAG 2.0 AAA (for a total of 170 tests for all three levels of conformance)

This means that we've broken down each and every criterion into a list of things to look out for. Those checklists come from either the techniques and failures, or from experience encountering accessibility barriers using various assistive technologies. For example, for SC 1.1.1 alone, we end up with 24 individual tests. Some of them are made using various browsers extensions in IE of Firefox, but a significant number have to be verified manually (SC 1.4.3 for images naturally come to mind, as would SC 1.4.8 or SC 2.1.2 for instance).

It may look like a lot of tests at first, but it turns out that it's not so bad because we never audit every page on a website, but rather pick a set of representative pages based on various templates. So in the worst cases, we rarely end up with more than 12 pages to audit. This selection is usually build up with:

* the homepage
* various section level homepages
* various inside pages that present a lot of diverse content (headings, lists, paragraphs and so on)
* at least one page containing a reasonably sized form (if any)
* at least one page containing a reasonably sized data table (if any)
* the site map

With time, we came to realize that doing more was unnecessary, because what people need is not a site wide diagnosis of their website accessibility, but rather some recommendations as to how to improve what's already there. By insisting on a limited set of representative pages and making sure the developers apply the proper corrections across all pages, we can get to pretty satisfying results without having to resort to full blown auditing, which would require an insanely huge amount of time form our part, not to mention sky-rocketing costs. 

For us at least, web accessibility auditing is always at least a two-phase process: a first assessment of what's out there and another one, after the recommendations have been put in place, to see how well the developers did. And so we also came to realize that the best way to ensure people would fix all pages and not only the ones that were audited was simply to retain, on the second round of evaluation, about 60% of the pages that were first audited and then go pick a few new ones just to see if they measure up with the ones that were fixed.

All in all, we usually plan about two hours per page audited, screen reader testing included. We feel an audit cannot be considered complete without combining both those checklists and user testing. So a 10 page evaluation would require anywhere between say, 15 to 20 hours of work per round. I'm very curious/interested to compare these numbers with what you folks currently do.

Anyway, here's for a first message to this list, it's already long enough.

Best,

--
Denis Boudreau, président
Coopérative AccessibilitéWeb
1751 rue Richardson, bureau 6111
Montréal (Qc), Canada H3K 1G6
Téléphone : +1 877.315.5550 

----------------------------------------------------
|	** a11yMTL 2011 - plus que 6 jours! **	|
|	* Tous les détails au www.a11ymtl.org *	|
----------------------------------------------------




On 2011-08-20, at 5:01 AM, Shadi Abou-Zahra wrote:

> Dear Tim, Detlev,
> 
> On 19.8.2011 19:50, Boland Jr, Frederick E. wrote:
>> Thanks for your insightful comments.  I think they are worthy of serious consideration.
>> My thoughts as you suggest were just as an input or starting point to 
>> further discussion on this topic.  Perhaps as part of the work of the 
>> EVAL TF we can come up with principles or characteristics of how an evaluation should be performed..
> 
> Yes, I agree that this is a useful discussion to have in Eval TF, and bring back consolidated suggestions to WCAG WG.
> 
> 
>> Thanks and best wishes
>> Tim Boland NIST
>> 
>> PS - is it OK to post this discussion to the EVAL TF mailing list (it 
>> might be useful  information for the members of the TF)?
> 
> Yes it is. I have CC'ed Eval TF.
> 
> Best,
>  Shadi
> 
> 
>> -----Original Message-----
>> From: w3c-wai-gl-request@w3.org [mailto:w3c-wai-gl-request@w3.org] On 
>> Behalf Of Detlev Fischer
>> Sent: Friday, August 19, 2011 12:14 PM
>> To: w3c-wai-gl@w3.org
>> Subject: Re: some comments/questions on techniques instructions 
>> document for submitters
>> 
>> Hi Tim Borland,
>> 
>> EVAL TF has just started so I went back to the level of atomic tests 
>> to see what their role might be in a practical accessibility 
>> evaluation approach.
>> 
>>   Atomic tests limited to a specific technique are certainly useful 
>> as a heuristic for implementers of such a technique to check whether 
>> they have implemented it correctly, and the points in the techniques 
>> instructions as well as your points on writing a 'good test' are 
>> therefore certainly valid on this level.
>> 
>> However, any evaluation procedure checking conformance of content to 
>> particular SC criteria needs to consider quite a number of techniques 
>> in conjunction. The 'complication' you mention can be avoided on the 
>> level of technique, not any longer on the level of SC.
>> 
>> Stating conformance to a particular SC  might involve a large number 
>> of techniques and failures, some applied alternatively, others in 
>> conjunction. For example, checking for compliance of all page content 
>> to SC 1.1.1 (Non-Text Content), any of the following 15 techniques 
>> and failures might be relevant: G95, G94, G100, G92, G74, G73, G196, 
>> H37, H67, H45, F67, F3, F20, F39, F65. And this does not even include 
>> the techniques which provide accessible text replacements for background images.
>> 
>> My belief is that in *practical terms*, concatenating a large number 
>> of partly interrelated atomic tests to arrive at a SC conformance 
>> judgement is just not a practical approach for human evaluation. If 
>> we want a *usable*, i.e., manageable procdure for a human tester to 
>> check whether the images on a page have proper alternative text, what 
>> *actually* happens is more something like a pattern matching of known 
>> (recogniszed)
>> failures:
>> 
>> * Display all images together with alt text (and, where available, 
>> href)
>> * Scan for instances of known failures - this also needs
>>    checking the image context for cases like G74 and G196
>> * Render page with custom colours (images now disappear) and check
>>    whether text replacements for background images are displayed
>> 
>> Moreover, if the *severity* of failure needs to be reflected in the 
>> conformance claim or associated tolerance metrics, then the failure 
>> to provide alt text for a main navigation item or graphical submit 
>> button must not be treated the same way as the failure to provide alt 
>> on some supporter's logo in the footer of the page.
>> 
>> My point is that while I am all for precision, the requirements for a 
>> rather complex integrated human assessment of a multitude of 
>> techniques and failures practically rule out an atomic approach where 
>> each applicable test of each applicable technique is carried out 
>> sequentially along the steps provided and then processed according to 
>> the logical concatenation of techniques given in the "How to meet" 
>> document. It simpy would be far too cumbersome.
>> 
>> I realise that you have not maintained that evaluation should be done 
>> that way - I just took your thoughts as a starting point. We have 
>> only just started with the EVAL task force work - I am curious what 
>> solutions we will arrive at to ensure rigor and mappability while 
>> still coming up with a manageable, doable approach.
>> 
>> Regards,
>> Detlev
>> 
>> Am 05.08.2011 16:28, schrieb Boland Jr, Frederick E.:
>>> For
>>> 
>>> http://www.w3.org/WAI/GL/wiki/Technique_Instructions
>>> 
>>> General Comments:
>>> 
>>> Under "Tests" should there be guidance on limiting the number of 
>>> steps in a testing procedure (not making tests too involved)?
>>> 
>>> (this gets to "what makes a good test"?
>>> 
>>> In .. http://www.w3.org/QA/WG/2005/01/test-faq#good
>>> 
>>> "A good test is:
>>> 
>>>   * Mappable to the specification (you must know what portion of the
>>>     specification it tests)
>>>   * Atomic (tests a single feature rather than multiple features)
>>>   * Self-documenting (explains what it is testing and what output it
>>>     expects)
>>>   * Focused on the technology under test rather than on ancillary
>>>     technologies
>>>   * Correct "
>>> 
>>> Does the information under "Tests" clearly convey information in 
>>> these items to potential submitters?
>>> 
>>> Furthermore, do we want to have some language somewhere in the 
>>> instructions that submitted techniques should not be too "complicated"
>>> (should just demonstrate simple features or atomic actions if possible)?
>>> 
>>> Editorial Comments:
>>> 
>>> under "Techniques Writeup Checklist "UW2" should be expanded to 
>>> "Understanding WCAG2"
>>> 
>>> 3^rd bullet under "applicability" has lots of typos..
>>> 
>>> Thanks and best wishes
>>> 
>>> Tim Boland NIST
>>> 
>> 
>> 
> 
> --
> Shadi Abou-Zahra - http://www.w3.org/People/shadi/ Activity Lead, 
> W3C/WAI International Program Office Evaluation and Repair Tools 
> Working Group (ERT WG) Research and Development Working Group (RDWG)
>
Received on Sunday, 21 August 2011 18:26:30 UTC