Re: Heuristic testing from Jeanne Spellman on 2019-04-26 (public-silver@w3.org from April 2019)

From: Jeanne Spellman <jspellman@spellmanconsulting.com>
Date: Fri, 26 Apr 2019 19:35:15 -0400
To: public-silver@w3.org
Message-ID: <41b05c54-0759-c672-a8e2-6f4a40358972@spellmanconsulting.com>
When we were discussing Silver Conformance at and after the 2018 TPAC 
F2F, I walked away thinking about Silver Conformance as a multi-faceted 
approach.  This is my memory of what we discussed in the weeks after 
TPAC and what we presented to AGWG in January at the meeting where we 
discussed Conformance.  Now that AGWG has been learning more about the 
plans for Silver, this Conformance prototype may make more sense.  The 
messy details of the conformance prototype 
<https://docs.google.com/document/d/1wTJme7ZhhtzyWBxI8oMXzl7i4QHW7aDHRYTKXKELPcY/edit#heading=h.xzxmbeyrlvz7> 
are documented in a Google doc that is open for comments.

== Summary ==

Proposed:

* Silver would still have traditional WCAG-type testing
* Silver won't require usability testing at the minimum level
* Scoring would be by site or project or product, as determined by the 
organization. This brings us more in line with the industry and 
regulatory requirements.  (for example, VPAT is by site, project, or 
product)
* Organizations that want a higher level (Silver or Gold) need to do 
more. There are roughly four categories of "more":     ** more advanced 
guidelines (current AAA plus new guidelines)
     ** more sophisticated testing of individual guidelines
     ** overall usability evaluations (several different types are proposed)
     ** task completion testing
* We want to encourage User Centered Design (UCD) and Inclusive Design, 
but Conformance seems the wrong place to do that, other than offering 
more points when the organization's process is documented.

== More Explanation about Testing and Scoring in the Conformance 
Prototype Proposal ==

1. We could have a minimum level (bronze) that is roughly equivalent to 
WCAG 2.1 AA.  This would be met (by site, by project, or product, not by 
page) by guidelines that would be largely familiar to a WCAG 2.1 user 
and would be tested mostly the way WCAG success criteria are tested 
today.  Whether we require specific guidelines or a minimum score in 
each category of functional need (in the EN sense) is still undecided.  
There are advantages and disadvantages to both and we need to do some 
modeling with real content.  We could include new guidelines from COGA, 
LV and others at this level if they "fit" in this level.  I can envision 
some COGA requirements that could be tested on a sliding scale with a 
clear rubric for evaluation fitting in at this level, but we still need 
to work this out.  In short, Bronze level conformance is much like WCAG 
AA is today except that it is site-wide instead of by page.

2. An organization that wanted to score in a higher level (Silver or 
Gold) would have to do more than WCAG AA testing.  They could
* implement guidelines that are more advanced than WCAG AA but still can 
be tested traditionally  (such as WCAG AAA or new guidelines from LV and 
COGA); or
* perform usability evaluations (of different types to be determined) on 
individual guidelines; or
* perform overall usability evaluations;
* implement new guidelines about successful task completion and document 
task completion tests; or
* or a mix of all four.

3. The guidelines that are conducive to more nuanced evaluation than a 
true/false success criteria would have Methods with tests that score 
higher points. I like Alastair's idea that they "trigger" the 
opportunity for more advanced evaluation.  They would award higher 
points  that could help bring the organization's overall score to a 
Silver or Gold level.  Some of those tests could be in the broad 
category of usability tests. If an organization decided to do the 
usability-type testing, the organization would have to document the 
testing procedure so it could be confirmed by an outside party, and 
document the changes (if any) made as a result of the testing.  I think 
we would want to give them a template of what documentation we would 
require.

4. There would be new guidelines that are NOT required for Bronze level, 
around task completion. We haven't worked on the details of this yet, 
but I don't think it is a problem.   I think those would be documented 
by the organization according to the template documentation template 
that would be required for the process they followed and the changes 
made as a result.  They would award points that would help an 
organization get to silver or gold level.  It would be helpful if we can 
design a task completion guideline that could be implemented at bronze 
level, but I don't know how we would do it.  We would welcome ideas.

John raises a good point that usability testing should be done in the 
Design and Prototyping phase of a project, not at the audit stage.  The 
organizations that we expect to be interested in achieving higher levels 
should be aware of that, but it helps to educate organizations about 
it.  The project manager tab would be a good place for that educational 
material.  Organizations that are only interested in an accessibility 
audit at the end of a project are not likely to be at a level of 
accessibility maturity to score higher than bronze.  I don't think it is 
a problem, just an education opportunity.

I think that accessibility consultancies should be watching the Silver 
project closely and looking for opportunities to develop new revenue 
streams from organizations who want a higher level than bronze.

To address David's concern that this conformance proposal is more 
complicated than current WCAG Conformance, I share his concern.  I was 
reminded  this week that when you are working on a hard problem, it gets 
very complicated until you get the details worked out.  Once that 
happens, you can start to make it simpler.

I hope this gives you more context for Charles' proposal.  We haven't 
discussed his proposal with AGWG because we are still working on the 
larger framework that it goes in.  I hope this explains how we are 
addressing some of Detlev's concerns.  Please let me know if it doesn't.

jeanne



On 4/26/2019 9:20 AM, John Foliot wrote:
> Alastair writes:
>
>     > ...we run research with users rather than doing expert
>     evaluations in those scenarios. This method should be something
>     that is above the baseline, _we should be recommending it be done
>     well before a typical accessibility audit_.
>
> +1 there.
>
> Traditionally "conformance statements" are issued at the end of a 
> development cycle, not during and certainly never *prior* to development.
>
> These kinds of user-testing activities however are, if not critical, 
> extremely informative towards reaching a higher level of 
> accessibility, but in almost every case, will also be (if I am to 
> understand the conversation) always unique: situational and in a 
> specific context (reason for the site, activity/ies being evaluated) 
> and extremely difficult to _*quantify*_ as part of a reporting mechanism.
>
> I am reminded here of the perennial tug-and-pull (certainly here in 
> the US) in the academic sector over standardized testing; where one 
> side of the argument states that "standardized testing doesn't teach 
> students to think, but rather encourages teachers to teach to the 
> test, and students to just memorize facts" versus the argument "how do 
> we measure progress and success at scale, if not via standardized 
> testing?" So while I appreciate the desire to move "accessibility" 
> conformance beyond mindlessly ticking boxes, the types of heuristics I 
> keep seeing being discussed do not seem (to me) to be addressing the 
> standardized for scale issue.
>
> JF
>
> On Fri, Apr 26, 2019 at 3:56 AM Alastair Campbell 
> <acampbell@nomensa.com <mailto:acampbell@nomensa.com>> wrote:
>
>     Hi Charles, Detlev,
>
>     > If the UCD process were a method, do you see any way in which that
>     could include all the functional needs?
>
>     I might be missing some context on ‘functional needs’, does that
>     mean the EN style “Can use it without vision” type of thing?
>
>     Broadly I can see 2 main approaches:
>
>     Approach 1: Evaluation as functional needs testing, I assume that
>     would be testing it ‘as a user’, either with real users or as an
>     expert/advocate. You would then be trying to use/analyse the site
>     as a user, and covering the guidelines within that process.
>
>     Approach 2: We test with each guidelines (in some sort of order to
>     be decided), and certain guidelines trigger a UCD method or
>     functional test to score more highly.
>
>     The difficulty with approach 1 is that you have 10 overlapping
>     functional needs (from the EN categories), it is time consuming
>     and repetitive to test that. I think it would be better to have a
>     set of UCD methods / functional tests that are **triggered** by
>     particular guidelines based on the content being evaluated. E.g.
>     navigation guidelines trigger IA methods, apps trigger
>     interface/usability testing.
>
>     >My general idea is that the minimum conformance can be achieved
>     without requiring any specific human evaluation method that
>     included people with specific functional needs.
>
>     That’s good, my concern is how best to do that. There are whole
>     ISO standards on user-centred design methods and we shouldn’t be
>     trying to replicate those.
>
>     On the example:
>
>     > Let’s say the author has created a tab bar navigation… But then it gets tested by and/or
>     for the functional needs of fine motor control and half of the
>     tests fail because even though the target size was sufficient, it
>     was found that there were too many accidental clicks due to the
>     proximity and insufficient space between.
>
>     Where a test is on a continuum (e.g. mobility impairment, low
>     vision) how would you set a bar for “too many accidental clicks”
>     in functional testing? It would entirely rest on who the
>     participants were, or what assumptions the expert had. I think
>     this is actually a case where a guideline about target size /
>     spacing would be better. You could score more by going to a higher
>     level on the continuum, like the AA/AAA contrast guidelines.
>
>     However, if it were Detlev’s example of a Chemical supplier, I
>     think the grouping and terminology used in the navigation are the
>     most likely issues for people with cognitive issues. The best tool
>     for improving that would be a card-sort or menu test conducted
>     with the target audience & people with cognitive issues.
>
>     Detlev wrote:
>
>     > Evaluators will often not have the domain knowledge …
>
>     Indeed, this is the case for a lot of the UX research we do, which
>     is why we run research with users rather than doing expert
>     evaluations in those scenarios.
>
>     This method should be something that is above the baseline, we
>     should be recommending it be done well /before/ a typical
>     accessibility audit.
>
>     > An expert might arrive at as a good a navigation structure as a group that went through a
>     card-sorting excercise (if one were to carry out user testing to
>     assess the quality of the result) - why should the fact that the
>     structure was arrived in card sorting lead to a higher score if
>     what counts is the accessibility/usability of the site for the end
>     user?
>
>     A card sort [1] with end-users is a formative method for creating
>     a navigation structure that generally* provides much better
>     results than the organisation would produce.
>
>     Usability testing isn’t a particularly good method for assessing
>     the navigation structure, a menu test (aka tree test) [2] is
>     another type of test that is better for evaluating that structure.
>
>     What counts is the accessibility/usability of the site for end
>     users, and when organisations conduct those types of research,
>     they generally* end up with more usable navigation.
>
>     I’d like to see navigation guidelines trigger IA methods, and
>     other interface guidelines trigger usability testing / cognitive
>     walkthroughs etc.
>
>     However, I do not think it is possible (or even useful) to try and
>     set a score for the output of those methods, the important thing
>     is they are done and acted on.
>
>     ISO 27001 uses that approach, where you document that you have
>     done something and acted on it, and later you have to re-do
>     certain things again the next year and show improvement.
>
>     I think we are agreeing that these methods needs to be part of the
>     silver / gold levels, there should be a (WCAG 2.x equivalent)
>     baseline that can be conducted by anyone.
>
>     Cheers,
>
>     -Alastair
>
>     1]
>     https://www.usability.gov/how-to-and-tools/methods/card-sorting.html
>
>     2] https://www.optimalworkshop.com/101/tree-testing#introWhat
>
>     * I’d like to say “always” as that’s my experience, but I assume
>     people can do it badly…
>
>
>
> -- 
> *John Foliot* | Principal Accessibility Strategist | W3C AC 
> Representative
> Deque Systems - Accessibility for Good
> deque.com <http://deque.com/>
>
Received on Friday, 26 April 2019 23:35:38 UTC