Re: Proposal for new version of Requirement 3.7 Motivtion from Jeanne Spellman on 2019-04-11 (public-silver@w3.org from April 2019)

From: Jeanne Spellman <jspellman@spellmanconsulting.com>
Date: Thu, 11 Apr 2019 19:46:28 -0400
To: public-silver@w3.org
Message-ID: <0d1db022-1cca-9a86-0f3a-bbe80db63a78@spellmanconsulting.com>
I think it is important to separate what we are talking about 
specifically, or we can talk around in circles disagreeing.  In WCAG 
2.x, "conformance" is the umbrella label that covers testing, levels, 
scoring, compliance, and W3C conformance.  It is easy to assume that the 
terms are interchangeable, and may be the reason this discussion is 
bogged down.  What we need to do to accomplish the goals we have for 
Silver is to tease these concepts apart and find creative ways of better 
addressing the needs of both people with disabilities and the 
organizations and stakeholders that use the guidelines.

I think we can agree that  the purpose of testing is to determine if the 
content creator did their work correctly.  Testing includes many 
different types of tests. Silver will still have automated and manual 
tests.  We won't have a system where people can fake a usability test 
and claim they meet the Guidelines.  That is a hypothetical that is not 
valid.  Usability testing that doesn't result in a correction or 
improvement isn't useful for our purposes.

Usability testing is not the only way that organizations will 
demonstrate that the content creator did their work correctly.  It is an 
enhancement.  It's a good enhancement -- many large organizations do 
it.  In fact, one challenge is how to give small organizations the same 
opportunity to achieve Gold level when they don't have big usability 
departments or specialists.  Usability testing is used in the United 
States in the Air Carrier Access Act (ACAA), so it is possible to have 
usability evaluations work in a US regulatory environment.  I think that 
Silver shouldn't model usability the way ACAA did, since the usability 
section of ACAA is narrow and air carriers are large organizations.  I'm 
grateful, however, that the "way is paved" for usability to be included 
with accessibility in a regulatory environment.  :)

Levels in WCAG are by success criteria. That has proven to be 
detrimental to people with cognitive disabilities (among others) because 
there is no incentive to implement AAA success criteria. We are 
proposing that Silver have overall levels for the product or project.  
The organization decides the scope of the product or project.  
Organizations often decide to evaluate, usability test, or claim 
compliance with portions of their websites.

Scoring is how we want to motivate people to do more.  We certainly will 
have some way of ensuring that people do AT LEAST the minimum across 
different user needs.  See the slide deck where Shawn and I talked about 
having categories of user needs and a minimum in each category. This was 
specifically added to address gaming the system.  For over a year, we 
have been discussing that bronze level is going to be roughly equivalent 
to WCAG 2.x AA.  We want to motivate people to do more than Bronze, so 
we have higher levels of Silver and Gold. That's where we propose that 
the user research, cognitive walkthroughs, and heuristic evaluations 
will fall.

Compliance is ultimately up to the governments that implement Silver in 
regulation.  Remember that governments decided whether to require WCAG 
A, WCAG AA, or WCAG with their own changes.  We don't decide compliance 
or decide court cases.  We are trying to make compliance easier for 
governments, judges, and lawyers by making the guidelines easier to 
understand and more transparent. Giving specific tests or procedures and 
a scoring system that allows determination of whether the minimum has 
been met should do this.  The devil is in the details.  That's why the 
conformance isn't done.  It isn't the tests that is the holdup, it is 
setting up a fair and transparent scoring system.  Especially setting up 
a scoring system that can accommodate the needs of large organizations 
who would like to be able to "substantially conform".

W3C Conformance is how we measure whether we have implementations of the 
Silver features so that Silver can exit Candidate Recommendation.  While 
the details of W3C Conformance also need to be worked out, it is not our 
highest priority at the moment.

It may be possible that we are all in agreement as long as we are using 
more specific terms than "conformance".   Let's not get bogged down in 
hypothetical and work together on details of how to make this work.   Or 
at least, let's agree that we want to motivate organizations to do more 
so we can get back to working on the specific details of exactly how we 
will do that.

jeanne

On 4/11/2019 6:21 PM, John Foliot wrote:
> Denis writes:
>
>     > ... The kind of issues that are raised by people with
>     disabilities in usability testing will usually relate to things we
>     could easily miss just because we don't have those disabilities
>     ourselves. And that level of findings, when addressed, definitely
>     pushes the quality of the product further.
>
> So... as far as usability testing is concerned _during content 
> creation time_ (i.e. pre-launch) - 100% with you.
>
> However here we're talking about conformance *reporting* in the 
> context of legal obligations: is this site "compliant" or not? Not "is 
> this site optimized for all users?", but rather "is this site in legal 
> jeopardy?" - and those are two completely different things. I'll go 
> back to what Wilco said:
>
>     /"I am skeptical about a point system as part of a *_conformance
>     model_ for accessibility*. I think a point system is a cool idea,
>     but not as part of the conformance model."/
>     /
>     /
>
> Going back to my hypothetical situation: If Detlev's user "passes" 
> something, Denis' user "struggles but completes the task", and my user 
> is "stopped dead in the water" - *_all on the same page/site_* simply 
> due to varying experience levels... who now should the judge believe? 
> Why?
>
> Facts, more than opinions, will be the deciding factor there. If 
> Detlev's "user score" suggests Gold, your "user score" suggests 
> Silver, and my "user score" suggests "Tin" how do we then arrive at a 
> real score (or partial score + other test methods)? The subjectivity 
> of end-users and what they report back is so open for (unintentional 
> or otherwise) gaming as to be a real concern to me.
>
> It has been suggested that providing user-testing would be one method 
> of 'increasing' your score, but again how do we make that testable and 
> repeatable? If in the above scenario Detlev's users and my users 
> cannot arrive at the same score on the same set of 'pages', how can we 
> ever add that to a conformance model? I fully support anything that 
> encourages more user-testing, for all of the value-adds you 
> enumerated. But to use user-testing as a means of confirming 
> "compliance" introduces a whole new level of complexity that I suspect 
> many will shake their heads at and walk away... (as sad as that 
> realization is to me).
>
> JF
>
> On Thu, Apr 11, 2019 at 4:20 PM Denis Boudreau 
> <denis.boudreau@deque.com <mailto:denis.boudreau@deque.com>> wrote:
>
>     JF wrote:
>     > Like the television character Mulder in the show X-Files, I too
>     want to believe. But having filled out more
>     > than one (US) VPAT over the years, the reality is that "Partially
>     Supports" (formally "Meets with Exceptions")
>     > tends to stay that way, and rarely gets fixed.
>
>     Very cute. Well played, sir.
>
>     JF also wrote:
>     > If Detlev's user "passes" something, Denis' user "struggles but
>     completes the task", and my user is "stopped
>     > dead in the water" - all on the same page/site simply due to
>     varying experience levels... how do we square that
>     > circular problem?
>
>     But surely, we all agree that the measurements or findings coming
>     from the usability testing the three of us hypothetically conduct
>     to inform about the inherent problems of a site contribute to
>     identifying further issues. By conducting these tests, we
>     ultimately get to address new sets of  issues and the process
>     brings expected additional value. Issues found through usability
>     testing, as opposed to issues found through say, automated or
>     manual testing, tend to otherwise be missed by non-disabled
>     accessibility experts who just happen to know about WCAG. The kind
>     of issues that are raised by people with disabilities in usability
>     testing will usually relate to things we could easily miss just
>     because we don't have those disabilities ourselves. And that level
>     of findings, when addressed, definitely pushes the quality of the
>     product further.
>
>     And JF finally wrote:
>     > Many of Deque's clients have thousands, if not hundreds of
>     thousands, of web "pages", and measuring
>     > conformance at that scale is already problematic. Introducing
>     user-testing into that scenario just made
>     > accessibility conformance testing significantly more expensive,
>     and any final conformance model will
>     > need to address this scale problem. User testing for conformance
>     might work at the boutique level,
>     > but at the enterprise level it's a bit of a pipe-dream... (IMHO)
>
>     Well, that's simply not true. The number of pages a site contains
>     has very little impact on the overall cost of usability testing
>     when what you are testing are flows, happy and not-so-happy paths,
>     and precise tasks that you are testing to validate some
>     assumptions you may have about parts of the interactions of
>     interfaces you may have doubts about. This is not something that
>     only boutique shops should be able to do. This is something that
>     can just as easily be conducted by software companies, or big IT
>     corporations, if only those who work there get the value of why
>     the whole effort is with their time, energy and resources.
>
>     The problem is not whether usability is a pipe-dream in larger,
>     more complex contexts. I mean, quality and accessibility could
>     just as easily be considered pipe-dreams if we look at it that way.
>
>
>
>     /Denis
>
>
>     *Denis Boudreau, CPWA* | Principal Accessibility SME & Training
>     Lead | 514-730-9168
>     Deque Systems - Accessibility for Good
>     Deque.com <http://www.deque.com>
>
>
>
>
>
>     On Thu, Apr 11, 2019 at 10:11 AM John Foliot
>     <john.foliot@deque.com <mailto:john.foliot@deque.com>> wrote:
>
>         Denis wrote:
>
>             > I believe that conducting testing with people with
>             disabilities, when done genuinely with the goal of user
>             experience improvements does absolutely change the quality
>             of the site under test.
>
>         Like the television character Mulder in the show X-Files, I
>         too want to believe. But having filled out more than one (US)
>         VPAT over the years, the reality is that "Partially Supports"
>         (formally "Meets with Exceptions") tends to stay that way, and
>         rarely gets fixed.
>
>         Testing with users with disabilities isn't the same as
>         remediating all issues they find, and to that end, I have to
>         agree with Detlev: user-testing alone is insufficient in
>         "boosting" a score - it's what comes *after* the user testing
>         that is important, and so user-testing is a "process" not an
>         end-state.
>
>         Don't get me wrong - like the majority of us, I understand and
>         appreciate the value of user-testing. It gives us a clearer
>         and more informed and more nuanced picture of the (current)
>         state of a web-site, but that activity alone does nothing to
>         *improve* the accessibility, only to more clearly define the
>         current state, good or bad.
>
>         For example, I can visually see if and when I think target
>         regions are too small, and/or I can "measure" those touch
>         regions, and/or I can ask a mobility impaired user to try
>         "clicking those buttons" - all three of those activities can
>         be used to determine if touch regions are sufficiently
>         big-enough, but why would involving an end user get me more
>         "points"? As such, I also agree with Wilco - I too think a
>         point system is an interesting idea, but not as part of a
>         conformance model, which requires some measurable rigidity,
>         even if we move from a Pass/Fail to a Bronze/Silver/Gold
>         reporting mechanism.
>
>         Additionally (and I've experienced this recently in the
>         context of testing a site for a client under legal duress),
>         not all users have the same skills or experience - and
>         "issues" reported by some users may not actually be issues
>         with the site/content at all, but rather the end user is
>         inexperienced or is "anticipating" a behavior that isn't
>         *mandated* (but might be nice to have). If Detlev's user
>         "passes" something, Denis' user "struggles but completes the
>         task", and my user is "stopped dead in the water" - all on the
>         same page/site simply due to varying experience levels... how
>         do we square that circular problem?
>
>         Finally, as I've previously noted, I remain concerned about
>         "scale" in the context of user-testing. Many of Deque's
>         clients have thousands, if not hundreds of thousands, of web
>         "pages", and measuring conformance at that scale is already
>         problematic. Introducing user-testing into that scenario just
>         made accessibility conformance testing significantly more
>         expensive, and any final conformance model will need to
>         address this scale problem. User testing for conformance might
>         work at the boutique level, but at the enterprise level it's a
>         bit of a pipe-dream... (IMHO)
>
>         JF
>
>         On Wed, Apr 10, 2019 at 1:16 PM Denis Boudreau
>         <denis.boudreau@deque.com <mailto:denis.boudreau@deque.com>>
>         wrote:
>
>             Hello all,
>
>             Wilco certainly makes good points, but I guess I'm more
>             optimistic than he is about our ability come up with a
>             process that would allow Silver to give more importance to
>             usability testing as part of a conformance model, without
>             negatively impacting certain demographics in the process.
>
>             /Denis
>
>
>             *Denis Boudreau, CPWA* | Principal Accessibility SME &
>             Training Lead | 514-730-9168
>             Deque Systems - Accessibility for Good
>             Deque.com <http://www.deque.com>
>
>
>
>
>
>             On Wed, Apr 10, 2019 at 10:30 AM Shawn Lauriat
>             <lauriat@google.com <mailto:lauriat@google.com>> wrote:
>
>                 Wilco,
>
>                     I can't see us ever agreeing that, if you do more
>                     for people with learning disabilities, you don't
>                     need to do as much for people with low vision. Any
>                     point system we use can't be at a conformance
>                     layer or guidelines layer. It has to be narrow, so
>                     we don't make the needs of one group
>                     interchangeable with another. That means point
>                     systems at the success criteria layer. WCAG
>                     already allows for this. Think of how color
>                     contrast is done. Two success criteria, one at AA,
>                     one at AAA, using the same measurement tool, with
>                     a lower threshold for AA and a higher one for AAA.
>
>
>                 Totally agree! We absolutely need conformance to cover
>                 different user needs and not allow someone to claim
>                 conformance for piling up methods for one user need
>                 and ignoring others. This requirement centers around
>                 providing a way to demonstrate and express a
>                 beyond-the-minimum level of accessibility, so building
>                 up from a base level of conformance, rather than
>                 replacing it with "awesome for blind users and broken
>                 if you have some kind of mobility impairment".
>
>                 Hope that helps!
>
>                 -Shawn
>
>                 On Wed, Apr 10, 2019 at 6:54 AM Wilco Fiers
>                 <wilco.fiers@deque.com <mailto:wilco.fiers@deque.com>>
>                 wrote:
>
>                     Hey all,
>                     I am skeptical about a point system as part of a
>                     conformance model for accessibility. I think a
>                     point system is a cool idea, but not as part of
>                     the conformance model.
>
>                     Point systems are great if you have different
>                     things you could do, that lead to roughly the same
>                     end result. For example, the airports with bike
>                     racks example is something that keeps coming up.
>                     You can do any number of things to get more people
>                     to leave their car at home. Better public
>                     transportation, encourage biking, encourage
>                     carpooling, etc. Any one of them reduces cars, and
>                     all of them do it by a lot.
>
>                     Accessibility doesn't really work like that.
>                     Keyboard accessibility and visible focus aren't
>                     interchangeable. Users need both of them. The few
>                     places in WCAG where more than one option is
>                     acceptable, we've already left the solution open
>                     (example: Bypass Blocks) or we've specified the
>                     available options (example: Audio Description or
>                     Media Alternative).
>
>                     I can't see us ever agreeing that, if you do more
>                     for people with learning disabilities, you don't
>                     need to do as much for people with low vision. Any
>                     point system we use can't be at a conformance
>                     layer or guidelines layer. It has to be narrow, so
>                     we don't make the needs of one group
>                     interchangeable with another. That means point
>                     systems at the success criteria layer. WCAG
>                     already allows for this. Think of how color
>                     contrast is done. Two success criteria, one at AA,
>                     one at AAA, using the same measurement tool, with
>                     a lower threshold for AA and a higher one for AAA.
>
>                     I can certainly see us having more "point systems"
>                     for different requirements. You could require 8
>                     points for non-text content at level A, and 12
>                     points at AA or whatever (just making up numbers).
>                     It might also be possible to create a point system
>                     that will work for lots of success criteria. But I
>                     don't see that working at the conformance level. A
>                     point system where you exchange one user need for
>                     another seems pretty problematic to me.
>
>                     W
>
>                     On Tue, Apr 9, 2019 at 1:59 PM Denis Boudreau
>                     <denis.boudreau@deque.com
>                     <mailto:denis.boudreau@deque.com>> wrote:
>
>                         I like the proposal with Chuck’s edits.
>
>                         I disagree with your position Detlev, but
>                         understand your concerns. The temptation to
>                         game the system would undoubtedly rise from
>                         some of the people out there that would want
>                         to be able to claim a quick path to success
>                         (oh yeah, we tested with people, and “they”
>                         said it was fiiiiiiine...).
>
>                         I’m just not able to agree with a statement
>                         such as:
>
>                         “[testing]... does not in itself change the
>                         quality of the site under test. An awful
>                         site stays awful even after a lot of user
>                         testing.”
>
>                         I believe that conducting testing with people
>                         with disabilities, when done genuinely with
>                         the goal of user experience improvements does
>                         absolutely change the quality of the site
>                         under test. The findings brought up by
>                         consulting those users is expected to bring
>                         forth positive changes. An awful site is
>                         supposed to get better as a result of the
>                         change that come from the activity of
>                         involving those users in the process. That’s
>                         just the nature of the activity. But we need a
>                         way to measure that clearly in Silver.
>
>                         I celebrate our vision of rewarding usability
>                         testing with end users with disabilities. It
>                         does expose our model to abuse - I certainly
>                         share Detlev’s concerns here - but I’m sure
>                         that as we get to defining the details of how
>                         the scoring system will pan out, we’ll find
>                         ways to reward usability testing for aspects
>                         that actually provide value, not for things
>                         that pay lip service to the idea of making the
>                         product or service accessible.
>
>                         As an example, we could consider pairing
>                         aspects of the usability testing sessions with
>                         tangible results or improvements that came
>                         directly from this testing. That way, the
>                         testing outcomes and related improvements
>                         could be linked to specific methods for
>                         instance, or techniques or whatnot, and we
>                         could measure just how many of the
>                         improvements came directly from involving end
>                         users with disabilities in the overall
>                         process. The more improvements came out direct
>                         end users contributions, the higher the points.
>
>
>                         /Denis
>
>                         —
>                         Denis Boudreau
>                         Principal accessibility SME & Training lead
>                         Deque Systems, Inc.
>                         514-730-9168
>
>
>
>                         On Tue, Apr 9, 2019 at 04:30 Detlev Fischer
>                         <detlev.fischer@testkreis.de
>                         <mailto:detlev.fischer@testkreis.de>> wrote:
>
>                             As I have said before, I think the mere
>                             fact that testing with users
>                             with disabilities has taken place should
>                             not be rewarded since it does
>                             not in itself change the quality of the
>                             site under test. An awful site
>                             stays awful even after a lot of user
>                             testing. If then, as a result of
>                             such testing, the accessibility and/or
>                             usability is improved, that
>                             should impact also the conformance to
>                             measurable criteria (whether
>                             absolute or score-based) - and I am happy
>                             to see those criteria extended
>                             to realms so far difficult to measure.
>
>                             Am 08.04.2019 um 20:42 schrieb Jeanne
>                             Spellman:
>                             > Here is the proposal for revision of
>                             Requirement 3.7 Motivation as
>                             > requested by AGWG to make it measureable.
>                             >
>                             > Motivation
>                             >
>                             > The Guidelines motivate organizations to
>                             go beyond minimal
>                             > accessibility requirements by providing
>                             a scoring system that rewards
>                             > organizations that demonstrate a greater
>                             effort to improve
>                             > accessibility.  For example, Methods
>                             that go beyond the minimum (such
>                             > as: Methods for Guidelines that are not
>                             included in WCAG 2.x A or AA,
>                             > task-completion evalations, or testing
>                             with users with disabilities)
>                             > are worth more points in the scoring system.
>                             >
>                             >
>                             >
>
>                             -- 
>                             Detlev Fischer
>                             Testkreis
>                             Werderstr. 34, 20144 Hamburg
>
>                             Mobil +49 (0)157 57 57 57 45
>
>                             http://www.testkreis.de
>                             Beratung, Tests und Schulungen für
>                             barrierefreie Websites
>
>
>                         -- 
>                         /Denis
>
>                         --
>                         Denis Boudreau
>                         Principal SME & trainer
>                         Web accessibility, inclusive design and UX
>                         Deque Systems inc.
>                         514-730-9168
>
>                         Keep in touch: @dboudreau
>
>
>
>                     -- 
>                     *Wilco Fiers*
>                     Axe product owner - Co-facilitator WCAG-ACT -
>                     Chair ACT-R / Auto-WCAG
>
>
>
>         -- 
>         *John Foliot* | Principal Accessibility Strategist | W3C AC
>         Representative
>         Deque Systems - Accessibility for Good
>         deque.com <http://deque.com/>
>
>
>
> -- 
> *John Foliot* | Principal Accessibility Strategist | W3C AC 
> Representative
> Deque Systems - Accessibility for Good
> deque.com <http://deque.com/>
>
Received on Thursday, 11 April 2019 23:47:28 UTC