Re: Proposal for new version of Requirement 3.7 Motivtion from Denis Boudreau on 2019-04-12 (public-silver@w3.org from April 2019)

From: Denis Boudreau <denis.boudreau@deque.com>
Date: Thu, 11 Apr 2019 21:41:08 -0400
To: Jeanne Spellman <jspellman@spellmanconsulting.com>
Cc: public-silver@w3.org
Message-ID: <CAC=s1Ag6AQ=aXoWXc8KK5hpG8-XMu9XheSUNLjUCdHWmtnvUPA@mail.gmail.com>
Yup, +1.

/Denis



On Thu, Apr 11, 2019 at 19:48 Jeanne Spellman <
jspellman@spellmanconsulting.com> wrote:

> I think it is important to separate what we are talking about
> specifically, or we can talk around in circles disagreeing.  In WCAG 2.x,
> "conformance" is the umbrella label that covers testing, levels, scoring,
> compliance, and W3C conformance.  It is easy to assume that the terms are
> interchangeable, and may be the reason this discussion is bogged down.
> What we need to do to accomplish the goals we have for Silver is to tease
> these concepts apart and find creative ways of better addressing the needs
> of both people with disabilities and the organizations and stakeholders
> that use the guidelines.
>
> I think we can agree that  the purpose of testing is to determine if the
> content creator did their work correctly.  Testing includes many different
> types of tests. Silver will still have automated and manual tests.  We
> won't have a system where people can fake a usability test and claim they
> meet the Guidelines.  That is a hypothetical that is not valid.  Usability
> testing that doesn't result in a correction or improvement isn't useful for
> our purposes.
>
> Usability testing is not the only way that organizations will demonstrate
> that the content creator did their work correctly.  It is an enhancement.
> It's a good enhancement -- many large organizations do it.  In fact, one
> challenge is how to give small organizations the same opportunity to
> achieve Gold level when they don't have big usability departments or
> specialists.  Usability testing is used in the United States in the Air
> Carrier Access Act (ACAA), so it is possible to have usability evaluations
> work in a US regulatory environment.  I think that Silver shouldn't model
> usability the way ACAA did, since the usability section of ACAA is narrow
> and air carriers are large organizations.  I'm grateful, however, that the
> "way is paved" for usability to be included with accessibility in a
> regulatory environment.  :)
>
> Levels in WCAG are by success criteria. That has proven to be detrimental
> to people with cognitive disabilities (among others) because there is no
> incentive to implement AAA success criteria.  We are proposing that Silver
> have overall levels for the product or project.  The organization decides
> the scope of the product or project.  Organizations often decide to
> evaluate, usability test, or claim compliance with portions of their
> websites.
>
> Scoring is how we want to motivate people to do more.  We certainly will
> have some way of ensuring that people do AT LEAST the minimum across
> different user needs.  See the slide deck where Shawn and I talked about
> having categories of user needs and a minimum in each category. This was
> specifically added to address gaming the system.  For over a year, we have
> been discussing that bronze level is going to be roughly equivalent to WCAG
> 2.x AA.  We want to motivate people to do more than Bronze, so we have
> higher levels of Silver and Gold. That's where we propose that the user
> research, cognitive walkthroughs, and heuristic evaluations will fall.
>
> Compliance is ultimately up to the governments that implement Silver in
> regulation.  Remember that governments decided whether to require WCAG A,
> WCAG AA, or WCAG with their own changes.  We don't decide compliance or
> decide court cases.  We are trying to make compliance easier for
> governments, judges, and lawyers by making the guidelines easier to
> understand and more transparent. Giving specific tests or procedures and a
> scoring system that allows determination of whether the minimum has been
> met should do this.  The devil is in the details.  That's why the
> conformance isn't done.  It isn't the tests that is the holdup, it is
> setting up a fair and transparent scoring system.  Especially setting up a
> scoring system that can accommodate the needs of large organizations who
> would like to be able to "substantially conform".
>
> W3C Conformance is how we measure whether we have implementations of the
> Silver features so that Silver can exit Candidate Recommendation.  While
> the details of W3C Conformance also need to be worked out, it is not our
> highest priority at the moment.
>
> It may be possible that we are all in agreement as long as we are using
> more specific terms than "conformance".   Let's not get bogged down in
> hypothetical and work together on details of how to make this work.   Or at
> least, let's agree that we want to motivate organizations to do more so we
> can get back to working on the specific details of exactly how we will do
> that.
>
> jeanne
> On 4/11/2019 6:21 PM, John Foliot wrote:
>
> Denis writes:
>
> > ... The kind of issues that are raised by people with disabilities in
> usability testing will usually relate to things we could easily miss just
> because we don't have those disabilities ourselves. And that level of
> findings, when addressed, definitely pushes the quality of the product
> further.
>
> So... as far as usability testing is concerned *during content creation
> time* (i.e. pre-launch) - 100% with you.
>
> However here we're talking about conformance *reporting* in the context of
> legal obligations: is this site "compliant" or not? Not "is this site
> optimized for all users?", but rather "is this site in legal jeopardy?" -
> and those are two completely different things. I'll go back to what Wilco
> said:
>
> *"I am skeptical about a point system as part of a conformance model for
> accessibility. I think a point system is a cool idea, but not as part of
> the conformance model."*
>
> Going back to my hypothetical situation: If Detlev's user "passes"
> something, Denis' user "struggles but completes the task", and my user is
> "stopped dead in the water" - *all on the same page/site* simply due to
> varying experience levels... who now should the judge believe? Why?
>
> Facts, more than opinions, will be the deciding factor there. If Detlev's
> "user score" suggests Gold, your "user score" suggests Silver, and my "user
> score" suggests "Tin" how do we then arrive at a real score (or partial
> score + other test methods)? The subjectivity of end-users and what they
> report back is so open for (unintentional or otherwise) gaming as to be a
> real concern to me.
>
> It has been suggested that providing user-testing would be one method of
> 'increasing' your score, but again how do we make that testable and
> repeatable? If in the above scenario Detlev's users and my users cannot
> arrive at the same score on the same set of 'pages', how can we ever add
> that to a conformance model? I fully support anything that encourages more
> user-testing, for all of the value-adds you enumerated. But to use
> user-testing as a means of confirming "compliance" introduces a whole new
> level of complexity that I suspect many will shake their heads at and walk
> away... (as sad as that realization is to me).
>
> JF
>
> On Thu, Apr 11, 2019 at 4:20 PM Denis Boudreau <denis.boudreau@deque.com>
> wrote:
>
>> JF wrote:
>> > Like the television character Mulder in the show X-Files, I too want to
>> believe. But having filled out more
>> > than one (US) VPAT over the years, the reality is that "Partially
>> Supports" (formally "Meets with Exceptions")
>> > tends to stay that way, and rarely gets fixed.
>>
>> Very cute. Well played, sir.
>>
>> JF also wrote:
>> > If Detlev's user "passes" something, Denis' user "struggles but
>> completes the task", and my user is "stopped
>> > dead in the water" - all on the same page/site simply due to varying
>> experience levels... how do we square that
>> > circular problem?
>>
>> But surely, we all agree that the measurements or findings coming from
>> the usability testing the three of us hypothetically conduct to inform
>> about the inherent problems of a site contribute to identifying further
>> issues. By conducting these tests, we ultimately get to address new sets
>> of  issues and the process brings expected additional value. Issues found
>> through usability testing, as opposed to issues found through say,
>> automated or manual testing, tend to otherwise be missed by non-disabled
>> accessibility experts who just happen to know about WCAG. The kind of
>> issues that are raised by people with disabilities in usability testing
>> will usually relate to things we could easily miss just because we don't
>> have those disabilities ourselves. And that level of findings, when
>> addressed, definitely pushes the quality of the product further.
>>
>> And JF finally wrote:
>> > Many of Deque's clients have thousands, if not hundreds of thousands,
>> of web "pages", and measuring
>> > conformance at that scale is already problematic. Introducing
>> user-testing into that scenario just made
>> > accessibility conformance testing significantly more expensive, and any
>> final conformance model will
>> > need to address this scale problem. User testing for conformance might
>> work at the boutique level,
>> > but at the enterprise level it's a bit of a pipe-dream... (IMHO)
>>
>> Well, that's simply not true. The number of pages a site contains has
>> very little impact on the overall cost of usability testing when what you
>> are testing are flows, happy and not-so-happy paths, and precise tasks that
>> you are testing to validate some assumptions you may have about parts of
>> the interactions of interfaces you may have doubts about. This is not
>> something that only boutique shops should be able to do. This is something
>> that can just as easily be conducted by software companies, or big IT
>> corporations, if only those who work there get the value of why the whole
>> effort is with their time, energy and resources.
>>
>> The problem is not whether usability is a pipe-dream in larger, more
>> complex contexts. I mean, quality and accessibility could just as easily be
>> considered pipe-dreams if we look at it that way.
>>
>>
>>
>> /Denis
>>
>>
>> *Denis Boudreau, CPWA* | Principal Accessibility SME & Training Lead
>> | 514-730-9168
>> Deque Systems - Accessibility for Good
>> Deque.com <http://www.deque.com>
>>
>>
>>
>>
>>
>> On Thu, Apr 11, 2019 at 10:11 AM John Foliot <john.foliot@deque.com>
>> wrote:
>>
>>> Denis wrote:
>>>
>>> > I believe that conducting testing with people with disabilities, when
>>> done genuinely with the goal of user experience improvements does
>>> absolutely change the quality of the site under test.
>>>
>>> Like the television character Mulder in the show X-Files, I too want to
>>> believe. But having filled out more than one (US) VPAT over the years, the
>>> reality is that "Partially Supports" (formally "Meets with Exceptions")
>>> tends to stay that way, and rarely gets fixed.
>>>
>>> Testing with users with disabilities isn't the same as remediating all
>>> issues they find, and to that end, I have to agree with Detlev:
>>> user-testing alone is insufficient in "boosting" a score - it's what comes
>>> *after* the user testing that is important, and so user-testing is a
>>> "process" not an end-state.
>>>
>>> Don't get me wrong - like the majority of us, I understand and
>>> appreciate the value of user-testing. It gives us a clearer and more
>>> informed and more nuanced picture of the (current) state of a web-site, but
>>> that activity alone does nothing to *improve* the accessibility, only to
>>> more clearly define the current state, good or bad.
>>>
>>> For example, I can visually see if and when I think target regions are
>>> too small, and/or I can "measure" those touch regions, and/or I can ask a
>>> mobility impaired user to try "clicking those buttons" - all three of those
>>> activities can be used to determine if touch regions are sufficiently
>>> big-enough, but why would involving an end user get me more "points"? As
>>> such, I also agree with Wilco - I too think a point system is an
>>> interesting idea, but not as part of a conformance model, which requires
>>> some measurable rigidity, even if we move from a Pass/Fail to a
>>> Bronze/Silver/Gold reporting mechanism.
>>>
>>> Additionally (and I've experienced this recently in the context of
>>> testing a site for a client under legal duress), not all users have the
>>> same skills or experience - and "issues" reported by some users may not
>>> actually be issues with the site/content at all, but rather the end user is
>>> inexperienced or is "anticipating" a behavior that isn't *mandated* (but
>>> might be nice to have). If Detlev's user "passes" something, Denis' user
>>> "struggles but completes the task", and my user is "stopped dead in the
>>> water" - all on the same page/site simply due to varying experience
>>> levels... how do we square that circular problem?
>>>
>>> Finally, as I've previously noted, I remain concerned about "scale" in
>>> the context of user-testing. Many of Deque's clients have thousands, if not
>>> hundreds of thousands, of web "pages", and measuring conformance at that
>>> scale is already problematic. Introducing user-testing into that scenario
>>> just made accessibility conformance testing significantly more expensive,
>>> and any final conformance model will need to address this scale problem.
>>> User testing for conformance might work at the boutique level, but at the
>>> enterprise level it's a bit of a pipe-dream... (IMHO)
>>>
>>> JF
>>>
>>> On Wed, Apr 10, 2019 at 1:16 PM Denis Boudreau <denis.boudreau@deque.com>
>>> wrote:
>>>
>>>> Hello all,
>>>>
>>>> Wilco certainly makes good points, but I guess I'm more optimistic than
>>>> he is about our ability come up with a process that would allow Silver to
>>>> give more importance to usability testing as part of a conformance model,
>>>> without negatively impacting certain demographics in the process.
>>>>
>>>> /Denis
>>>>
>>>>
>>>> *Denis Boudreau, CPWA* | Principal Accessibility SME & Training Lead
>>>> | 514-730-9168
>>>> Deque Systems - Accessibility for Good
>>>> Deque.com <http://www.deque.com>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Wed, Apr 10, 2019 at 10:30 AM Shawn Lauriat <lauriat@google.com>
>>>> wrote:
>>>>
>>>>> Wilco,
>>>>>
>>>>> I can't see us ever agreeing that, if you do more for people with
>>>>>> learning disabilities, you don't need to do as much for people with low
>>>>>> vision. Any point system we use can't be at a conformance layer or
>>>>>> guidelines layer. It has to be narrow, so we don't make the needs of one
>>>>>> group interchangeable with another. That means point systems at the success
>>>>>> criteria layer. WCAG already allows for this. Think of how color contrast
>>>>>> is done. Two success criteria, one at AA, one at AAA, using the same
>>>>>> measurement tool, with a lower threshold for AA and a higher one for AAA.
>>>>>
>>>>>
>>>>> Totally agree! We absolutely need conformance to cover different user
>>>>> needs and not allow someone to claim conformance for piling up methods for
>>>>> one user need and ignoring others. This requirement centers around
>>>>> providing a way to demonstrate and express a beyond-the-minimum level of
>>>>> accessibility, so building up from a base level of conformance, rather than
>>>>> replacing it with "awesome for blind users and broken if you have some kind
>>>>> of mobility impairment".
>>>>>
>>>>> Hope that helps!
>>>>>
>>>>> -Shawn
>>>>>
>>>>> On Wed, Apr 10, 2019 at 6:54 AM Wilco Fiers <wilco.fiers@deque.com>
>>>>> wrote:
>>>>>
>>>>>> Hey all,
>>>>>> I am skeptical about a point system as part of a conformance model
>>>>>> for accessibility. I think a point system is a cool idea, but not as part
>>>>>> of the conformance model.
>>>>>>
>>>>>> Point systems are great if you have different things you could do,
>>>>>> that lead to roughly the same end result. For example, the airports with
>>>>>> bike racks example is something that keeps coming up. You can do any number
>>>>>> of things to get more people to leave their car at home. Better public
>>>>>> transportation, encourage biking, encourage carpooling, etc. Any one of
>>>>>> them reduces cars, and all of them do it by a lot.
>>>>>>
>>>>>> Accessibility doesn't really work like that. Keyboard accessibility
>>>>>> and visible focus aren't interchangeable. Users need both of them. The few
>>>>>> places in WCAG where more than one option is acceptable, we've already left
>>>>>> the solution open (example: Bypass Blocks) or we've specified the available
>>>>>> options (example: Audio Description or Media Alternative).
>>>>>>
>>>>>> I can't see us ever agreeing that, if you do more for people with
>>>>>> learning disabilities, you don't need to do as much for people with low
>>>>>> vision. Any point system we use can't be at a conformance layer or
>>>>>> guidelines layer. It has to be narrow, so we don't make the needs of one
>>>>>> group interchangeable with another. That means point systems at the success
>>>>>> criteria layer. WCAG already allows for this. Think of how color contrast
>>>>>> is done. Two success criteria, one at AA, one at AAA, using the same
>>>>>> measurement tool, with a lower threshold for AA and a higher one for AAA.
>>>>>>
>>>>>> I can certainly see us having more "point systems" for different
>>>>>> requirements. You could require 8 points for non-text content at level A,
>>>>>> and 12 points at AA or whatever (just making up numbers). It might also be
>>>>>> possible to create a point system that will work for lots of success
>>>>>> criteria. But I don't see that working at the conformance level. A point
>>>>>> system where you exchange one user need for another seems pretty
>>>>>> problematic to me.
>>>>>>
>>>>>> W
>>>>>>
>>>>>> On Tue, Apr 9, 2019 at 1:59 PM Denis Boudreau <
>>>>>> denis.boudreau@deque.com> wrote:
>>>>>>
>>>>>>> I like the proposal with Chuck’s edits.
>>>>>>>
>>>>>>> I disagree with your position Detlev, but understand your concerns.
>>>>>>> The temptation to game the system would undoubtedly rise from some of the
>>>>>>> people out there that would want to be able to claim a quick path to
>>>>>>> success (oh yeah, we tested with people, and “they” said it was
>>>>>>> fiiiiiiine...).
>>>>>>>
>>>>>>> I’m just not able to agree with a statement such as:
>>>>>>>
>>>>>>> “[testing]... does not in itself change the quality of the site
>>>>>>> under test. An awful site stays awful even after a lot of user testing.”
>>>>>>>
>>>>>>> I believe that conducting testing with people with disabilities,
>>>>>>> when done genuinely with the goal of user experience improvements does
>>>>>>> absolutely change the quality of the site under test. The findings brought
>>>>>>> up by consulting those users is expected to bring forth positive changes.
>>>>>>> An awful site is supposed to get better as a result of the change that come
>>>>>>> from the activity of involving those users in the process. That’s just the
>>>>>>> nature of the activity. But we need a way to measure that clearly in Silver.
>>>>>>>
>>>>>>> I celebrate our vision of rewarding usability testing with end users
>>>>>>> with disabilities. It does expose our model to abuse - I certainly share
>>>>>>> Detlev’s concerns here - but I’m sure that as we get to defining the
>>>>>>> details of how the scoring system will pan out, we’ll find ways to reward
>>>>>>> usability testing for aspects that actually provide value, not for things
>>>>>>> that pay lip service to the idea of making the product or service
>>>>>>> accessible.
>>>>>>>
>>>>>>> As an example, we could consider pairing aspects of the usability
>>>>>>> testing sessions with tangible results or improvements that came directly
>>>>>>> from this testing. That way, the testing outcomes and related improvements
>>>>>>> could be linked to specific methods for instance, or techniques or whatnot,
>>>>>>> and we could measure just how many of the improvements came directly from
>>>>>>> involving end users with disabilities in the overall process. The more
>>>>>>> improvements came out direct end users contributions, the higher the points.
>>>>>>>
>>>>>>>
>>>>>>> /Denis
>>>>>>>
>>>>>>> —
>>>>>>> Denis Boudreau
>>>>>>> Principal accessibility SME & Training lead
>>>>>>> Deque Systems, Inc.
>>>>>>> 514-730-9168
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Apr 9, 2019 at 04:30 Detlev Fischer <
>>>>>>> detlev.fischer@testkreis.de> wrote:
>>>>>>>
>>>>>>>> As I have said before, I think the mere fact that testing with
>>>>>>>> users
>>>>>>>> with disabilities has taken place should not be rewarded since it
>>>>>>>> does
>>>>>>>> not in itself change the quality of the site under test. An awful
>>>>>>>> site
>>>>>>>> stays awful even after a lot of user testing. If then, as a result
>>>>>>>> of
>>>>>>>> such testing, the accessibility and/or usability is improved, that
>>>>>>>> should impact also the conformance to measurable criteria (whether
>>>>>>>> absolute or score-based) - and I am happy to see those criteria
>>>>>>>> extended
>>>>>>>> to realms so far difficult to measure.
>>>>>>>>
>>>>>>>> Am 08.04.2019 um 20:42 schrieb Jeanne Spellman:
>>>>>>>> > Here is the proposal for revision of Requirement 3.7 Motivation
>>>>>>>> as
>>>>>>>> > requested by AGWG to make it measureable.
>>>>>>>> >
>>>>>>>> > Motivation
>>>>>>>> >
>>>>>>>> > The Guidelines motivate organizations to go beyond minimal
>>>>>>>> > accessibility requirements by providing a scoring system that
>>>>>>>> rewards
>>>>>>>> > organizations that demonstrate a greater effort to improve
>>>>>>>> > accessibility.  For example, Methods that go beyond the minimum
>>>>>>>> (such
>>>>>>>> > as: Methods for Guidelines that are not included in WCAG 2.x A or
>>>>>>>> AA,
>>>>>>>> > task-completion evalations, or testing with users with
>>>>>>>> disabilities)
>>>>>>>> > are worth more points in the scoring system.
>>>>>>>> >
>>>>>>>> >
>>>>>>>> >
>>>>>>>>
>>>>>>>> --
>>>>>>>> Detlev Fischer
>>>>>>>> Testkreis
>>>>>>>> Werderstr. 34, 20144 Hamburg
>>>>>>>> <https://maps.google.com/?q=Werderstr.+34,+20144+Hamburg&entry=gmail&source=g>
>>>>>>>>
>>>>>>>> Mobil +49 (0)157 57 57 57 45
>>>>>>>>
>>>>>>>> http://www.testkreis.de
>>>>>>>> Beratung, Tests und Schulungen für barrierefreie Websites
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>> /Denis
>>>>>>>
>>>>>>> --
>>>>>>> Denis Boudreau
>>>>>>> Principal SME & trainer
>>>>>>> Web accessibility, inclusive design and UX
>>>>>>> Deque Systems inc.
>>>>>>> 514-730-9168
>>>>>>>
>>>>>>> Keep in touch: @dboudreau
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> *Wilco Fiers*
>>>>>> Axe product owner - Co-facilitator WCAG-ACT - Chair ACT-R / Auto-WCAG
>>>>>>
>>>>>
>>>
>>> --
>>> *John Foliot* | Principal Accessibility Strategist | W3C AC
>>> Representative
>>> Deque Systems - Accessibility for Good
>>> deque.com
>>>
>>>
>
> --
> *John Foliot* | Principal Accessibility Strategist | W3C AC
> Representative
> Deque Systems - Accessibility for Good
> deque.com
>
> --
/Denis

--
Denis Boudreau
Principal SME & trainer
Web accessibility, inclusive design and UX
Deque Systems inc.
514-730-9168

Keep in touch: @dboudreau
Received on Friday, 12 April 2019 01:41:46 UTC