Re: some comments/questions on techniques instructions document for submitters from Gregg Vanderheiden on 2011-08-20 (w3c-wai-gl@w3.org from July to September 2011)

From: Gregg Vanderheiden <gv@trace.wisc.edu>
Date: Sat, 20 Aug 2011 09:22:10 -0500
To: Denis Boudreau <dboudreau@accessibiliteweb.com>
Cc: Eval TF <public-wai-evaltf@w3.org>, WCAG WG <w3c-wai-gl@w3.org>
Message-id: <F633BB6D-47A0-4A71-AE65-9EB2E1717E24@trace.wisc.edu>
Vgood.

The sampling approach is also a good idea.   If you find nothing -- then one could sample more if one wanted to be thorough.  But that is usually enough to find any systematic errors or issues.  

When you talk about atomic tests - are these automated or both automated and human?  If so - approx what percent of each or both?

Also, when the web site doesn’t do any of the techniques listed in WCAG - what do you do?  

Finally, how do you detect information that is presented only visually by page layout?     And then how would you associate that with programmatically determined text?

Does the "just 12 pages" approach allow you to use humans - so that does the trick? 


I presume this 12 pages is for a rather modest (hundreds vs hundreds of thousands of web pages) or highly templated web site.   Some companies have dozens or scores of "home" pages - that are all different in format. 

thanks


Gregg
--------------------------------------------------------
Gregg Vanderheiden Ph.D.
Director Trace R&D Center
Professor Industrial & Systems Engineering
and Biomedical Engineering
University of Wisconsin-Madison

Co-Director, Raising the Floor - International
and the Global Public Inclusive Infrastructure Project
http://Raisingthefloor.org   ---   http://GPII.net








On Aug 20, 2011, at 9:01 AM, Denis Boudreau wrote:

> Good morning everyone,
> 
> I guess this is a good opportunity to dive right in the EVAL TF work and share a bit of our experience with methodology.
> 
> While I agree with Detlev on some level, I do not believe we can be thorough and confident to cover all the related techniques and failures associated to a specific success criterion while auditing if we do not go down to that atomic level. Grouping different elements together to limit the number of tests will make it easy on the auditor, no doubt about that, but in my humble opinion, would naturally lead the forgetting things along the way. 
> 
> The example if SC 1.1.1 is great because of the quantity of elements to look for, and so would be 1.3.1. When there are so many things to look out for, it's easy to either forget one or feel overwhelmed by the quantity. But on the other hand, this is just the reality of accessibility testing.
> 
> When we do WCAG 2.0 assessment work at the office, we go over a series of 170 atomic tests for all 61 SC, divided like so:
> 
> * 105 tests for WCAG 2.0 A
> * 27 tests for WCAG 2.0 AA (for a total of 132 tests for lvl A and AA)
> * 38 tests for WCAG 2.0 AAA (for a total of 170 tests for all three levels of conformance)
> 
> This means that we've broken down each and every criterion into a list of things to look out for. Those checklists come from either the techniques and failures, or from experience encountering accessibility barriers using various assistive technologies. For example, for SC 1.1.1 alone, we end up with 24 individual tests. Some of them are made using various browsers extensions in IE of Firefox, but a significant number have to be verified manually (SC 1.4.3 for images naturally come to mind, as would SC 1.4.8 or SC 2.1.2 for instance).
> 
> It may look like a lot of tests at first, but it turns out that it's not so bad because we never audit every page on a website, but rather pick a set of representative pages based on various templates. So in the worst cases, we rarely end up with more than 12 pages to audit. This selection is usually build up with:
> 
> * the homepage
> * various section level homepages
> * various inside pages that present a lot of diverse content (headings, lists, paragraphs and so on)
> * at least one page containing a reasonably sized form (if any)
> * at least one page containing a reasonably sized data table (if any)
> * the site map
> 
> With time, we came to realize that doing more was unnecessary, because what people need is not a site wide diagnosis of their website accessibility, but rather some recommendations as to how to improve what's already there. By insisting on a limited set of representative pages and making sure the developers apply the proper corrections across all pages, we can get to pretty satisfying results without having to resort to full blown auditing, which would require an insanely huge amount of time form our part, not to mention sky-rocketing costs. 
> 
> For us at least, web accessibility auditing is always at least a two-phase process: a first assessment of what's out there and another one, after the recommendations have been put in place, to see how well the developers did. And so we also came to realize that the best way to ensure people would fix all pages and not only the ones that were audited was simply to retain, on the second round of evaluation, about 60% of the pages that were first audited and then go pick a few new ones just to see if they measure up with the ones that were fixed.
> 
> All in all, we usually plan about two hours per page audited, screen reader testing included. We feel an audit cannot be considered complete without combining both those checklists and user testing. So a 10 page evaluation would require anywhere between say, 15 to 20 hours of work per round. I'm very curious/interested to compare these numbers with what you folks currently do.
> 
> Anyway, here's for a first message to this list, it's already long enough.
> 
> Best,
> 
> -- 
> Denis Boudreau, président
> Coopérative AccessibilitéWeb 
> 1751 rue Richardson, bureau 6111 
> Montréal (Qc), Canada H3K 1G6 
> Téléphone : +1 877.315.5550 
> 
> ----------------------------------------------------
> |	** a11yMTL 2011 - plus que 6 jours! **	|
> |	* Tous les détails au www.a11ymtl.org *	|
> ----------------------------------------------------
> 
> 
> 
> 
> On 2011-08-20, at 5:01 AM, Shadi Abou-Zahra wrote:
> 
>> Dear Tim, Detlev,
>> 
>> On 19.8.2011 19:50, Boland Jr, Frederick E. wrote:
>>> Thanks for your insightful comments.  I think they are worthy of serious consideration.
>>> My thoughts as you suggest were just as an input or starting point to further discussion
>>> on this topic.  Perhaps as part of the work of the EVAL TF we can come up with principles
>>> or characteristics of how an evaluation should be performed..
>> 
>> Yes, I agree that this is a useful discussion to have in Eval TF, and bring back consolidated suggestions to WCAG WG.
>> 
>> 
>>> Thanks and best wishes
>>> Tim Boland NIST
>>> 
>>> PS - is it OK to post this discussion to the EVAL TF mailing list (it might be useful
>>> information for the members of the TF)?
>> 
>> Yes it is. I have CC'ed Eval TF.
>> 
>> Best,
>> Shadi
>> 
>> 
>>> -----Original Message-----
>>> From: w3c-wai-gl-request@w3.org [mailto:w3c-wai-gl-request@w3.org] On Behalf Of Detlev Fischer
>>> Sent: Friday, August 19, 2011 12:14 PM
>>> To: w3c-wai-gl@w3.org
>>> Subject: Re: some comments/questions on techniques instructions document for submitters
>>> 
>>> Hi Tim Borland,
>>> 
>>> EVAL TF has just started so I went back to the level of atomic tests to
>>> see what their role might be in a practical accessibility evaluation
>>> approach.
>>> 
>>>  Atomic tests limited to a specific technique are certainly useful as a
>>> heuristic for implementers of such a technique to check whether they
>>> have implemented it correctly, and the points in the techniques
>>> instructions as well as your points on writing a 'good test' are
>>> therefore certainly valid on this level.
>>> 
>>> However, any evaluation procedure checking conformance of content to
>>> particular SC criteria needs to consider quite a number of techniques in
>>> conjunction. The 'complication' you mention can be avoided on the level
>>> of technique, not any longer on the level of SC.
>>> 
>>> Stating conformance to a particular SC  might involve a large number of
>>> techniques and failures, some applied alternatively, others in
>>> conjunction. For example, checking for compliance of all page content to
>>> SC 1.1.1 (Non-Text Content), any of the following 15 techniques and
>>> failures might be relevant: G95, G94, G100, G92, G74, G73, G196, H37,
>>> H67, H45, F67, F3, F20, F39, F65. And this does not even include the
>>> techniques which provide accessible text replacements for background images.
>>> 
>>> My belief is that in *practical terms*, concatenating a large number of
>>> partly interrelated atomic tests to arrive at a SC conformance judgement
>>> is just not a practical approach for human evaluation. If we want a
>>> *usable*, i.e., manageable procdure for a human tester to check whether
>>> the images on a page have proper alternative text, what *actually*
>>> happens is more something like a pattern matching of known (recogniszed)
>>> failures:
>>> 
>>> * Display all images together with alt text (and, where available, href)
>>> * Scan for instances of known failures - this also needs
>>>   checking the image context for cases like G74 and G196
>>> * Render page with custom colours (images now disappear) and check
>>>   whether text replacements for background images are displayed
>>> 
>>> Moreover, if the *severity* of failure needs to be reflected in the
>>> conformance claim or associated tolerance metrics, then the failure to
>>> provide alt text for a main navigation item or graphical submit button
>>> must not be treated the same way as the failure to provide alt on some
>>> supporter's logo in the footer of the page.
>>> 
>>> My point is that while I am all for precision, the requirements for a
>>> rather complex integrated human assessment of a multitude of techniques
>>> and failures practically rule out an atomic approach where each
>>> applicable test of each applicable technique is carried out sequentially
>>> along the steps provided and then processed according to the logical
>>> concatenation of techniques given in the "How to meet" document. It
>>> simpy would be far too cumbersome.
>>> 
>>> I realise that you have not maintained that evaluation should be done
>>> that way - I just took your thoughts as a starting point. We have only
>>> just started with the EVAL task force work - I am curious what solutions
>>> we will arrive at to ensure rigor and mappability while still coming up
>>> with a manageable, doable approach.
>>> 
>>> Regards,
>>> Detlev
>>> 
>>> Am 05.08.2011 16:28, schrieb Boland Jr, Frederick E.:
>>>> For
>>>> 
>>>> http://www.w3.org/WAI/GL/wiki/Technique_Instructions
>>>> 
>>>> General Comments:
>>>> 
>>>> Under "Tests" should there be guidance on limiting the number of steps
>>>> in a testing procedure (not making tests too involved)?
>>>> 
>>>> (this gets to "what makes a good test"?
>>>> 
>>>> In .. http://www.w3.org/QA/WG/2005/01/test-faq#good
>>>> 
>>>> "A good test is:
>>>> 
>>>>  * Mappable to the specification (you must know what portion of the
>>>>    specification it tests)
>>>>  * Atomic (tests a single feature rather than multiple features)
>>>>  * Self-documenting (explains what it is testing and what output it
>>>>    expects)
>>>>  * Focused on the technology under test rather than on ancillary
>>>>    technologies
>>>>  * Correct "
>>>> 
>>>> Does the information under "Tests" clearly convey information in these
>>>> items to potential submitters?
>>>> 
>>>> Furthermore, do we want to have some language somewhere in the
>>>> instructions that submitted techniques should not be too "complicated"
>>>> (should just demonstrate simple features or atomic actions if possible)?
>>>> 
>>>> Editorial Comments:
>>>> 
>>>> under "Techniques Writeup Checklist "UW2" should be expanded to
>>>> "Understanding WCAG2"
>>>> 
>>>> 3^rd bullet under "applicability" has lots of typos..
>>>> 
>>>> Thanks and best wishes
>>>> 
>>>> Tim Boland NIST
>>>> 
>>> 
>>> 
>> 
>> -- 
>> Shadi Abou-Zahra - http://www.w3.org/People/shadi/
>> Activity Lead, W3C/WAI International Program Office
>> Evaluation and Repair Tools Working Group (ERT WG)
>> Research and Development Working Group (RDWG)
>> 
> 
>
Received on Saturday, 20 August 2011 14:22:43 UTC