Re: Heuristic testing from John Foliot on 2019-04-26 (public-silver@w3.org from April 2019)

From: John Foliot <john.foliot@deque.com>
Date: Fri, 26 Apr 2019 08:20:34 -0500
To: Alastair Campbell <acampbell@nomensa.com>
Cc: "Hall, Charles (DET-MRM)" <Charles.Hall@mrm-mccann.com>, Detlev Fischer <detlev.fischer@testkreis.de>, "public-silver@w3.org" <public-silver@w3.org>
Message-ID: <CAKdCpxxhPnnxp_NEeyw7eL5f0d_axZV6YbCLhe980DLpWHnqbQ@mail.gmail.com>
Alastair writes:

> ...we run research with users rather than doing expert evaluations in
those scenarios. This method should be something that is above the
baseline, *we should be recommending it be done well before a typical
accessibility audit*.

+1 there.

Traditionally "conformance statements" are issued at the end of a
development cycle, not during and certainly never *prior* to development.

These kinds of user-testing activities however are, if not critical,
extremely informative towards reaching a higher level of accessibility, but
in almost every case, will also be (if I am to understand the conversation)
always unique: situational and in a specific context (reason for the site,
activity/ies being evaluated) and extremely difficult to *quantify* as part
of a reporting mechanism.

I am reminded here of the perennial tug-and-pull (certainly here in the US)
in the academic sector over standardized testing; where one side of the
argument states that "standardized testing doesn't teach students to think,
but rather encourages teachers to teach to the test, and students to just
memorize facts" versus the argument "how do we measure progress and success
at scale, if not via standardized testing?" So while I appreciate the
desire to move "accessibility" conformance beyond mindlessly ticking boxes,
the types of heuristics I keep seeing being discussed do not seem (to me)
to be addressing the standardized for scale issue.

JF

On Fri, Apr 26, 2019 at 3:56 AM Alastair Campbell <acampbell@nomensa.com>
wrote:

> Hi Charles, Detlev,
>
>
>
> > If the UCD process were a method, do you see any way in which that
> could include all the functional needs?
>
>
>
> I might be missing some context on ‘functional needs’, does that mean the
> EN style “Can use it without vision” type of thing?
>
>
>
> Broadly I can see 2 main approaches:
>
>
>
> Approach 1: Evaluation as functional needs testing, I assume that would be
> testing it ‘as a user’, either with real users or as an expert/advocate.
> You would then be trying to use/analyse the site as a user, and covering
> the guidelines within that process.
>
>
>
> Approach 2: We test with each guidelines (in some sort of order to be
> decided), and certain guidelines trigger a UCD method or functional test to
> score more highly.
>
>
>
> The difficulty with approach 1 is that you have 10 overlapping functional
> needs (from the EN categories), it is time consuming and repetitive to test
> that. I think it would be better to have a set of UCD methods / functional
> tests that are **triggered** by particular guidelines based on the
> content being evaluated. E.g. navigation guidelines trigger IA methods,
> apps trigger interface/usability testing.
>
>
>
>
>
> > My general idea is that the minimum conformance can be achieved without
> requiring any specific human evaluation method that included people with
> specific functional needs.
>
>
>
> That’s good, my concern is how best to do that. There are whole ISO
> standards on user-centred design methods and we shouldn’t be trying to
> replicate those.
>
>
>
> On the example:
>
>
>
> > Let’s say the author has created a tab bar navigation… But then it gets
> tested by and/or for the functional needs of fine motor control and half of
> the tests fail because even though the target size was sufficient, it was
> found that there were too many accidental clicks due to the proximity and
> insufficient space between.
>
>
>
> Where a test is on a continuum (e.g. mobility impairment, low vision) how
> would you set a bar for “too many accidental clicks” in functional testing?
> It would entirely rest on who the participants were, or what assumptions
> the expert had. I think this is actually a case where a guideline about
> target size / spacing would be better. You could score more by going to a
> higher level on the continuum, like the AA/AAA contrast guidelines.
>
>
>
> However, if it were Detlev’s example of a Chemical supplier, I think the
> grouping and terminology used in the navigation are the most likely issues
> for people with cognitive issues. The best tool for improving that would be
> a card-sort or menu test conducted with the target audience & people with
> cognitive issues.
>
>
>
> Detlev wrote:
>
> > Evaluators will often not have the domain knowledge …
>
>
>
> Indeed, this is the case for a lot of the UX research we do, which is why
> we run research with users rather than doing expert evaluations in those
> scenarios.
>
> This method should be something that is above the baseline, we should be
> recommending it be done well *before* a typical accessibility audit.
>
>
>
> > An expert might arrive at as a good a navigation structure as a group
> that went through a card-sorting excercise (if one were to carry out user
> testing to assess the quality of the result) - why should the fact that the
> structure was arrived in card sorting lead to a higher score if what counts
> is the accessibility/usability of the site for the end user?
>
>
>
> A card sort [1] with end-users is a formative method for creating a
> navigation structure that generally* provides much better results than the
> organisation would produce.
>
>
>
> Usability testing isn’t a particularly good method for assessing the
> navigation structure, a menu test (aka tree test) [2] is another type of
> test that is better for evaluating that structure.
>
>
>
> What counts is the accessibility/usability of the site for end users, and
> when organisations conduct those types of research, they generally* end up
> with more usable navigation.
>
>
>
> I’d like to see navigation guidelines trigger IA methods, and other
> interface guidelines trigger usability testing / cognitive walkthroughs etc.
>
>
>
> However, I do not think it is possible (or even useful) to try and set a
> score for the output of those methods, the important thing is they are done
> and acted on.
>
>
>
> ISO 27001 uses that approach, where you document that you have done
> something and acted on it, and later you have to re-do certain things again
> the next year and show improvement.
>
>
>
> I think we are agreeing that these methods needs to be part of the silver
> / gold levels, there should be a (WCAG 2.x equivalent) baseline that can be
> conducted by anyone.
>
>
>
> Cheers,
>
>
>
> -Alastair
>
>
>
> 1] https://www.usability.gov/how-to-and-tools/methods/card-sorting.html
>
> 2] https://www.optimalworkshop.com/101/tree-testing#introWhat
>
> * I’d like to say “always” as that’s my experience, but I assume people
> can do it badly…
>


-- 
*John Foliot* | Principal Accessibility Strategist | W3C AC Representative
Deque Systems - Accessibility for Good
deque.com
Received on Friday, 26 April 2019 13:21:38 UTC