Re: Heuristic testing from David MacDonald on 2019-04-29 (public-silver@w3.org from April 2019)

From: David MacDonald <david100@sympatico.ca>
Date: Mon, 29 Apr 2019 12:45:13 -0400
To: Jeanne Spellman <jspellman@spellmanconsulting.com>
Cc: Silver Task Force <public-silver@w3.org>
Message-ID: <CAAdDpDYKs1ekqPRVuK9RxOt-6U-_03LmKnCSKFSpvrdknGvW4g@mail.gmail.com>
In general, I'm attracted to this. Here are some quick thoughts:

   - I think we may need an explicit line in the conformance document that
   law and policy makers should not require, or formalize some of the Levels
   or Guidelines. This might be SIlver and Gold conformance, but we there may
   be things in there that we want them to include in policy and legislation,
   so this would be decided as we get closer. WCAG 2.0 and 2.1 did this with
   the line "*Note*2: It is not recommended that Level AAA conformance be
   required as a general policy for entire sites because it is not possible to
   satisfy all Level AAA Success Criteria for some content."
   https://www.w3.org/TR/WCAG21/#cc1   I am bringing this p at this early
   stage because I could imagine it will generate much discussion throughout
   the lifecycle of development.
   - What would we do with WCAG 2.x Level A as a level of conformance in
   Silver, which in some jurisdictions is the required level, and they would
   want some recognition in the Silver ecosystem? Perhaps we could drop it but
   I think it requires some significant discussion with stakeholders, who may
   have strong opinions.
   - I am worried about complexity of conformance and also cost of testing
   but I understand we need to drift into a complicated world before
   attempting to simplify, which is what we tried to do with WCAG. WCAG
   language was actually much more complicated in the 2006 draft than in the
   final 2008 version. I just hope the final SIlver would end up simpler not
   more difficult to understand and test than WCAG.

Cheers,
David MacDonald



*Can**Adapt* *Solutions Inc.*

Tel:  613-806-9005

LinkedIn
<http://www.linkedin.com/in/davidmacdonald100>

twitter.com/davidmacd

GitHub <https://github.com/DavidMacDonald>

www.Can-Adapt.com <http://www.can-adapt.com/>



*  Adapting the web to all users*
*            Including those with disabilities*

If you are not the intended recipient, please review our privacy policy
<http://www.davidmacd.com/disclaimer.html>


On Fri, Apr 26, 2019 at 7:35 PM Jeanne Spellman <
jspellman@spellmanconsulting.com> wrote:

> When we were discussing Silver Conformance at and after the 2018 TPAC F2F,
> I walked away thinking about Silver Conformance as a multi-faceted
> approach.  This is my memory of what we discussed in the weeks after TPAC
> and what we presented to AGWG in January at the meeting where we discussed
> Conformance.  Now that AGWG has been learning more about the plans for
> Silver, this Conformance prototype may make more sense.  The messy
> details of the conformance prototype
> <https://docs.google.com/document/d/1wTJme7ZhhtzyWBxI8oMXzl7i4QHW7aDHRYTKXKELPcY/edit#heading=h.xzxmbeyrlvz7>
> are documented in a Google doc that is open for comments.
>
> == Summary ==
>
> Proposed:
>
> * Silver would still have traditional WCAG-type testing
> * Silver won't require usability testing at the minimum level
> * Scoring would be by site or project or product, as determined by the
> organization. This brings us more in line with the industry and regulatory
> requirements.  (for example, VPAT is by site, project, or product)
> * Organizations that want a higher level (Silver or Gold) need to do more.
> There are roughly four categories of "more":     ** more advanced
> guidelines (current AAA plus new guidelines)
>     ** more sophisticated testing of individual guidelines
>     ** overall usability evaluations (several different types are proposed)
>     ** task completion testing
> * We want to encourage User Centered Design (UCD) and Inclusive Design,
> but Conformance seems the wrong place to do that, other than offering more
> points when the organization's process is documented.
>
> == More Explanation about Testing and Scoring in the Conformance Prototype
> Proposal ==
>
> 1. We could have a minimum level (bronze) that is roughly equivalent to
> WCAG 2.1 AA.  This would be met (by site, by project, or product, not by
> page) by guidelines that would be largely familiar to a WCAG 2.1 user and
> would be tested mostly the way WCAG success criteria are tested today.
> Whether we require specific guidelines or a minimum score in each category
> of functional need (in the EN sense) is still undecided.  There are
> advantages and disadvantages to both and we need to do some modeling with
> real content.  We could include new guidelines from COGA, LV and others at
> this level if they "fit" in this level.  I can envision some COGA
> requirements that could be tested on a sliding scale with a clear rubric
> for evaluation fitting in at this level, but we still need to work this
> out.  In short, Bronze level conformance is much like WCAG AA is today
> except that it is site-wide instead of by page.
>
> 2. An organization that wanted to score in a higher level (Silver or Gold)
> would have to do more than WCAG AA testing.  They could
> * implement guidelines that are more advanced than WCAG AA but still can
> be tested traditionally  (such as WCAG AAA or new guidelines from LV and
> COGA); or
> * perform usability evaluations (of different types to be determined) on
> individual guidelines; or
> * perform overall usability evaluations;
> * implement new guidelines about successful task completion and document
> task completion tests; or
> * or a mix of all four.
>
> 3. The guidelines that are conducive to more nuanced evaluation than a
> true/false success criteria would have Methods with tests that score higher
> points. I like Alastair's idea that they "trigger" the opportunity for more
> advanced evaluation.  They would award higher points  that could help bring
> the organization's overall score to a Silver or Gold level.  Some of those
> tests could be in the broad category of usability tests. If an organization
> decided to do the usability-type testing, the organization would have to
> document the testing procedure so it could be confirmed by an outside
> party, and document the changes (if any) made as a result of the testing.
> I think we would want to give them a template of what documentation we
> would require.
>
> 4. There would be new guidelines that are NOT required for Bronze level,
> around task completion. We haven't worked on the details of this yet, but I
> don't think it is a problem.   I think those would be documented by the
> organization according to the template documentation template that would be
> required for the process they followed and the changes made as a result.
> They would award points that would help an organization get to silver or
> gold level.  It would be helpful if we can design a task completion
> guideline that could be implemented at bronze level, but I don't know how
> we would do it.  We would welcome ideas.
>
> John raises a good point that usability testing should be done in the
> Design and Prototyping phase of a project, not at the audit stage.  The
> organizations that we expect to be interested in achieving higher levels
> should be aware of that, but it helps to educate organizations about it.
> The project manager tab would be a good place for that educational
> material.  Organizations that are only interested in an accessibility audit
> at the end of a project are not likely to be at a level of accessibility
> maturity to score higher than bronze.  I don't think it is a problem, just
> an education opportunity.
>
> I think that accessibility consultancies should be watching the Silver
> project closely and looking for opportunities to develop new revenue
> streams from organizations who want a higher level than bronze.
>
> To address David's concern that this conformance proposal is more
> complicated than current WCAG Conformance, I share his concern.  I was
> reminded  this week that when you are working on a hard problem, it gets
> very complicated until you get the details worked out.  Once that happens,
> you can start to make it simpler.
>
> I hope this gives you more context for Charles' proposal.  We haven't
> discussed his proposal with AGWG because we are still working on the larger
> framework that it goes in.  I hope this explains how we are addressing some
> of Detlev's concerns.  Please let me know if it doesn't.
>
> jeanne
>
>
>
> On 4/26/2019 9:20 AM, John Foliot wrote:
>
> Alastair writes:
>
> > ...we run research with users rather than doing expert evaluations in
> those scenarios. This method should be something that is above the
> baseline, *we should be recommending it be done well before a typical
> accessibility audit*.
>
> +1 there.
>
> Traditionally "conformance statements" are issued at the end of a
> development cycle, not during and certainly never *prior* to development.
>
> These kinds of user-testing activities however are, if not critical,
> extremely informative towards reaching a higher level of accessibility, but
> in almost every case, will also be (if I am to understand the conversation)
> always unique: situational and in a specific context (reason for the site,
> activity/ies being evaluated) and extremely difficult to *quantify* as
> part of a reporting mechanism.
>
> I am reminded here of the perennial tug-and-pull (certainly here in the
> US) in the academic sector over standardized testing; where one side of the
> argument states that "standardized testing doesn't teach students to think,
> but rather encourages teachers to teach to the test, and students to just
> memorize facts" versus the argument "how do we measure progress and success
> at scale, if not via standardized testing?" So while I appreciate the
> desire to move "accessibility" conformance beyond mindlessly ticking boxes,
> the types of heuristics I keep seeing being discussed do not seem (to me)
> to be addressing the standardized for scale issue.
>
> JF
>
> On Fri, Apr 26, 2019 at 3:56 AM Alastair Campbell <acampbell@nomensa.com>
> wrote:
>
>> Hi Charles, Detlev,
>>
>>
>>
>> > If the UCD process were a method, do you see any way in which that
>> could include all the functional needs?
>>
>>
>>
>> I might be missing some context on ‘functional needs’, does that mean the
>> EN style “Can use it without vision” type of thing?
>>
>>
>>
>> Broadly I can see 2 main approaches:
>>
>>
>>
>> Approach 1: Evaluation as functional needs testing, I assume that would
>> be testing it ‘as a user’, either with real users or as an expert/advocate.
>> You would then be trying to use/analyse the site as a user, and covering
>> the guidelines within that process.
>>
>>
>>
>> Approach 2: We test with each guidelines (in some sort of order to be
>> decided), and certain guidelines trigger a UCD method or functional test to
>> score more highly.
>>
>>
>>
>> The difficulty with approach 1 is that you have 10 overlapping functional
>> needs (from the EN categories), it is time consuming and repetitive to test
>> that. I think it would be better to have a set of UCD methods / functional
>> tests that are **triggered** by particular guidelines based on the
>> content being evaluated. E.g. navigation guidelines trigger IA methods,
>> apps trigger interface/usability testing.
>>
>>
>>
>>
>>
>> > My general idea is that the minimum conformance can be achieved
>> without requiring any specific human evaluation method that included people
>> with specific functional needs.
>>
>>
>>
>> That’s good, my concern is how best to do that. There are whole ISO
>> standards on user-centred design methods and we shouldn’t be trying to
>> replicate those.
>>
>>
>>
>> On the example:
>>
>>
>>
>> > Let’s say the author has created a tab bar navigation… But then it gets
>> tested by and/or for the functional needs of fine motor control and half of
>> the tests fail because even though the target size was sufficient, it was
>> found that there were too many accidental clicks due to the proximity and
>> insufficient space between.
>>
>>
>>
>> Where a test is on a continuum (e.g. mobility impairment, low vision) how
>> would you set a bar for “too many accidental clicks” in functional testing?
>> It would entirely rest on who the participants were, or what assumptions
>> the expert had. I think this is actually a case where a guideline about
>> target size / spacing would be better. You could score more by going to a
>> higher level on the continuum, like the AA/AAA contrast guidelines.
>>
>>
>>
>> However, if it were Detlev’s example of a Chemical supplier, I think the
>> grouping and terminology used in the navigation are the most likely issues
>> for people with cognitive issues. The best tool for improving that would be
>> a card-sort or menu test conducted with the target audience & people with
>> cognitive issues.
>>
>>
>>
>> Detlev wrote:
>>
>> > Evaluators will often not have the domain knowledge …
>>
>>
>>
>> Indeed, this is the case for a lot of the UX research we do, which is why
>> we run research with users rather than doing expert evaluations in those
>> scenarios.
>>
>> This method should be something that is above the baseline, we should be
>> recommending it be done well *before* a typical accessibility audit.
>>
>>
>>
>> > An expert might arrive at as a good a navigation structure as a group
>> that went through a card-sorting excercise (if one were to carry out user
>> testing to assess the quality of the result) - why should the fact that the
>> structure was arrived in card sorting lead to a higher score if what counts
>> is the accessibility/usability of the site for the end user?
>>
>>
>>
>> A card sort [1] with end-users is a formative method for creating a
>> navigation structure that generally* provides much better results than the
>> organisation would produce.
>>
>>
>>
>> Usability testing isn’t a particularly good method for assessing the
>> navigation structure, a menu test (aka tree test) [2] is another type of
>> test that is better for evaluating that structure.
>>
>>
>>
>> What counts is the accessibility/usability of the site for end users, and
>> when organisations conduct those types of research, they generally* end up
>> with more usable navigation.
>>
>>
>>
>> I’d like to see navigation guidelines trigger IA methods, and other
>> interface guidelines trigger usability testing / cognitive walkthroughs etc.
>>
>>
>>
>> However, I do not think it is possible (or even useful) to try and set a
>> score for the output of those methods, the important thing is they are done
>> and acted on.
>>
>>
>>
>> ISO 27001 uses that approach, where you document that you have done
>> something and acted on it, and later you have to re-do certain things again
>> the next year and show improvement.
>>
>>
>>
>> I think we are agreeing that these methods needs to be part of the silver
>> / gold levels, there should be a (WCAG 2.x equivalent) baseline that can be
>> conducted by anyone.
>>
>>
>>
>> Cheers,
>>
>>
>>
>> -Alastair
>>
>>
>>
>> 1] https://www.usability.gov/how-to-and-tools/methods/card-sorting.html
>>
>> 2] https://www.optimalworkshop.com/101/tree-testing#introWhat
>>
>> * I’d like to say “always” as that’s my experience, but I assume people
>> can do it badly…
>>
>
>
> --
> *John Foliot* | Principal Accessibility Strategist | W3C AC
> Representative
> Deque Systems - Accessibility for Good
> deque.com
>
>
Received on Monday, 29 April 2019 16:45:50 UTC