Re: Alternative Silver / WCAG 3.0 conformance model proposal from jake abma on 2020-08-25 (w3c-wai-gl@w3.org from July to September 2020)

From: jake abma <jake.abma@gmail.com>
Date: Tue, 25 Aug 2020 10:19:25 +0200
To: Katie Haritos-Shea <ryladog@gmail.com>
Cc: Wilco Fiers <wilco.fiers@deque.com>, Detlev Fischer <detlev.fischer@testkreis.de>, Alastair Campbell <acampbell@nomensa.com>, WCAG group <w3c-wai-gl@w3.org>
Message-ID: <CAMpCG4G7yvf17o2ZPDL4KpPC8in+ejLO_oKKRqxXZJoXnhes2A@mail.gmail.com>
As I've done lots of testing for Silver with a structure of Guidelines
containing Functional Outcomes and see how to score Methods I  basically
come to the same conclusion of Detlev.
The testing also had more thorough adjectival rating scoring than I've seen
so far at the other proposals and the conclusions Detlev draws are
very similar.

draft examples of the adjectival rating (just to give an idea):

https://docs.google.com/spreadsheets/d/1iCJfyMtcsSq7GHmwnc4aTNguadRfGDa0H8FBZMaJpcQ/edit#gid=633158340
https://docs.google.com/spreadsheets/d/1yTpbJFsaadIbCigV15YK6bWB4xk_MoOgREjGMHHMVgQ/edit#gid=1217251850

People not present at the Silver calls probably have not seen the files so
I'll add the Silver directory with the work:

https://drive.google.com/drive/folders/1WYA8DH3uLLJaSmwsXM0wvNm1oDG9FVQ0

A BIG warning that they need some explanation and are drafty but you can
see the work done.
This is to give an idea of diving into the deep of the proposals and
suggest to not try to understand all info in the spreadsheets but to see
all the work in the different tabs in the spreadsheets.
Hereby some deeplinks to tabs in spreadsheets to give the idea:

https://docs.google.com/spreadsheets/d/1yTpbJFsaadIbCigV15YK6bWB4xk_MoOgREjGMHHMVgQ/edit#gid=1290202920
https://docs.google.com/spreadsheets/d/1yTpbJFsaadIbCigV15YK6bWB4xk_MoOgREjGMHHMVgQ/edit#gid=1217251850

https://docs.google.com/spreadsheets/d/1iCJfyMtcsSq7GHmwnc4aTNguadRfGDa0H8FBZMaJpcQ/edit#gid=2091469352
https://docs.google.com/spreadsheets/d/12qVlCHZDvCVcLSd0Ez2LjTW_2sJxKkK76KfpUKNYoug/edit#gid=1983238719

I have also send my thoughts and 2 cents before my vacation in a mail as I
didn't have time to present a conformance proposal, and this might be
interesting to read as it mentioned some concerns I have:

-----------------------------

Hi Chuck / all,



Thank you so much for the opportunity!

The timing for me is not good though, and sadly I have to tell you that
summer vacation will start next week and I will be occupied by two
functional outcomes running around in my house.



To give my 2 cents on the request I do want to mention the following:



I honestly think we are not ready to decide on a scoring system or
conformance model and really wonder why or how we can even think of one if
we do not have the framework and examples to base it on. The last weeks
I've done lots of experiments and the conclusion is very, very clear: we
need a clear structure (framework) and proper examples of some artifacts
we've just started to invent. Scoring / testing on a atomic level is not
our challenge as we know how to do that, but from within the framework
where we have all kind of relations and placing them in tasks or paths is
where the challenges begin.



Just some facts:

   - We recently 'decided' to go for 'paths' instead of tasks, what a path
   is is not clear yet and we do not have 1 example (will it work at all?)
   - We wanted to use tasks, we do not have one clear task worked out and
   have no idea if it works
   - We only recently have a master list of functional needs, a / the user
   needs based on that list does not exist (yet)
   - Grouping these user needs and see which and where functional outcomes
   pop up is not something we have
   - Grouping functional outcomes to see what kind of guidelines emerge is
   still far away
   - Functional Outcomes will be normative and must be scored / tested (NOT
   the methods)
   - Functional outcomes can have 1 or more methods to fulfil the outcome,
   this conflicts and need normalization or a layered approach
   - More than one functional outcome can be part of a guideline, also
   needs normalization
   - The tests are not created as we do not have a list of functional
   outcomes in a framework
   - We do not have granular tests by far who work fine at the moment, only
   abstract ideas on a holistic level, not on the detail level where the hard
   work needs to be done

Some other facts

   - Our own approach as documented should be:

https://docs.google.com/document/d/1gfYAiV2Z-FA_kEHYlLV32J8ClNEGPxRgSIohu3gUHEA/edit#heading=h.s6cmfinlgb3q
<https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.google.com%2Fdocument%2Fd%2F1gfYAiV2Z-FA_kEHYlLV32J8ClNEGPxRgSIohu3gUHEA%2Fedit%23heading%3Dh.s6cmfinlgb3q&data=02%7C01%7Cjake.abma%40ing.com%7C23598cb7738e493efc5c08d82dcd4c3e%7C587b6ea13db94fe1a9d785d4c64ce5cc%7C0%7C0%7C637309707053311384&sdata=xyA3A9eCh6kGtN2HIMrK0IcQZ8o8GVUaSR27vTEwCqk%3D&reserved=0>

   - Part 1 - Define User Need
      - Part 2 - Write Functional Outcomes
      - Part 3 - Develop Tests
      - Part 4 - Write Methods
      - ....
      - ....
      - Part 6 - Write the Guideline
      - Part 8 - Evaluate
      - Part 10 - Conformance???


   - This is not what I see happening and we try to work the other way
   around


   - Seems like we start with a guideline and work our way backwards to
      fill in the gaps, this worked counterproductive I think as I've shown for
      "headings"


   - My suggestion would be to have consensus on each step first and see if
   it is solid enough to go to the next step

Have lots more to mention but my summary would be we must need the
following before even thinking about conformance:

   - A clear list of functional outcomes (not complete but relevant enough
   + consensus) based on the functional needs (NO guidelines yet!)
   - Tests for the outcomes (technology agnostic on the normative level
   BEFORE methods and how to test methods technology specific + consensus)
   - Possible guidelines all the outcomes belong to with conflicting
   methods (guidelines with varied functional outcomes and  layered  methods)
   - Clear elaborated paths / tasks (minimal 2), and all possible
   guidelines and see if we can create boundaries for such an approach
   - Adjectival rating examples who work and are solid (we do not have
   them, the ones present do not work yet as proposed + consensus)
   - Only if we worked out these relations we can start thinking about a
   good conformance model.

Again, the key lies within the functional outcomes list, how to test them
(test methods) AND normalize them from within a possible guideline /
between guidelines WITHOUT having (technology specific) methods.



I just can't see and don't see how and why we must come up with a
conformance model and scoring system if we do not have the proper examples
yet on the other ingredients.



Cheers,

Jake

-----------------------------------

So far an update from my side, wondering what has been decided the last 3
weeks...

Cheers,
Jake



Op do 13 aug. 2020 om 21:17 schreef Katie Haritos-Shea <ryladog@gmail.com>:

> Thanks Detlev,
>
> I find this very interesting and as another viable option here
>
> ** katie **
>
> *Katie Haritos-Shea*
> *Principal ICT Accessibility Architect*
>
>
> *Senior Product Manager/Compliance/Accessibility **SME*
> *, **Core Merchant Framework UX, Clover*
>
>
> *W3C Advisory Committee Member and Representative for Knowbility *
>
>
> *WCAG/Section 508/ADA/AODA/QA/FinServ/FinTech/Privacy,* *IAAP CPACC+WAS
> = **CPWA* <http://www.accessibilityassociation.org/cpwacertificants>
>
> *Cell: **703-371-5545 <703-371-5545>** |* *ryladog@gmail.com
> <ryladog@gmail.com>* *| **Seneca, SC **|* *LinkedIn Profile
> <http://www.linkedin.com/in/katieharitosshea/>*
>
> People may forget exactly what it was that you said or did, but they will
> never forget how you made them feel.......
>
> Our scars remind us of where we have been........they do not have to
> dictate where we are going.
>
>
>
>
>
>
> On Thu, Aug 13, 2020 at 1:09 PM Wilco Fiers <wilco.fiers@deque.com> wrote:
>
>> Hey folks,
>>
>> I've been thinking about this a bunch, and I haven't yet come up with
>> something that doesn't cause a heap of problems. I did want to share my
>> thoughts on this so far, as I hope it might trigger some ideas on solutions.
>>
>> My biggest question I have at the moment is can we come up with a grading
>> mechanism that is reasonable, that can be done consistently, without
>> significantly increasing the effort it would take to do a test.
>>
>>
>> At the foundation of that is that we need some sort of mechanism by which
>> we count the number of fails, or passes, or both. It may not seem too
>> difficult for us to count for example the number of images on the page, but
>> the more you think about this, the harder it gets. For WCAG 2, it really
>> doesn't matter if an image map is one image or one plus however many areas
>> there are. But if having 1 problematic image gets you "good" in WCAG 3, and
>> having 5 gets you "reasonable", suddenly being able to do that starts to
>> matter a whole lot.
>>
>> Scoring mechanisms based on numbers of issues makes WCAG testing far more
>> difficult then it is today. Throwing tools at it might mitigate that
>> problem, but it creates another one. Tools have a very different
>> perspective then humans do. If we write this to work well for tools, it
>> might not be viable for humans to do that same evaluation, and vice versa.
>> Even between tools perspectives differ substantially. A tool with access to
>> the web extension APIs will find certain things very easy that are nearly
>> impossible for other tools (there are other APIs like that too). What do we
>> do there? Force all tools toward one particular browser that happens to
>> have the APIs we like best? Lowest common denominator? What about non-HTML
>> tools?
>>
>>
>> The most promising idea I've come up with is to instead of having WCAG
>> define the unit of counting, have that be based on scope. Your conformance
>> claim can say something like, this page consists of three sections; header,
>> main, footer. The header and main score good, the footer scores poor for
>> screen reader users, but good for all other user groups. Or you can break
>> that same thing down into paths where some paths score higher than others.
>>
>> That avoids the problem of counting... sort of. This won't work well if
>> we break scope down to the element level. Also if someone insists on a pass
>> or fail at a page level, there it still comes to a complete fail, even for
>> a minor issue.
>>
>>
>> Wilco
>>
>> On Thu, Aug 13, 2020 at 5:43 PM Detlev Fischer <
>> detlev.fischer@testkreis.de> wrote:
>>
>>> Hi Alastair,
>>>
>>> I think I would still prefer to rate SCs (or functional outcomes / FO in
>>> WCAG 3.0) on the basis of what the best and worst possible implementation
>>> is for that particular FO, and use those to define the end points of the 5
>>> point scale rather that using that scale to also reflect priorisation
>>> *across* FOs. I think that should be done on another level. It may well be
>>> that priorisation (or relative weight) of FOs depends on contexts and may
>>> be defined differently for different types of applications, for example.
>>>
>>> In my view, what is urgently needed for the scoring model is a separate
>>> 'stop condition' for flagging critical issues. To take an example: SC 3.1.2
>>> "Language of Parts" (or a related FO) might fail on an information-oriented
>>> site with some publication titles without lang markup (0/5) but that would
>>> not be critical for such a site. On the other hand, if you are testing an
>>> online translation service, failure of 3.1.2 (0/5) should be flagged as
>>> critical, because it is. So, if you have a sample of such an application
>>> containing the view with the main translation function and besides that, 5
>>> other views like FAQ, imprint, version history etc., a 5/5 rating on those
>>> other views (for lack of foreign language content) must not bury the
>>> critical issue by raising the overall score of 3.1.2 in aggregation across
>>> views sampled. Or reporting may present the aggregate score, but clearly
>>> flag the failure. Flagged failures could be used prevent overall
>>> conformance of the chosen scope if we accept the condition "no critical
>>> failures in any functional outcome across the sampled views" - but that
>>> needs to be debated, of course.
>>>
>>> Best,
>>> Detlev
>>>
>>> Am 13.08.2020 um 15:25 schrieb Alastair Campbell:
>>>
>>> Thanks for that Detlev,
>>>
>>> I'm not sure if your proposal would actually be simpler overall
>>> (compared to where the others might get to), but I really like some aspects
>>> like the "Adjectival or 5 point scoring" slide. That explains something
>>> where there are many options in a very straightforward way.
>>>
>>> It occurs to me that some prioritisation could be built into that with
>>> much less controversy.
>>>
>>> For example, having flashes on the view scores 0/5. If we decided that
>>> "language of the page" (environment, whatever) was less of an issue, it
>>> could score 2/5. (Or 3 if the user group is known and the default works?)
>>>
>>> That's a hypothetical, just pointing out that when you have finer gained
>>> scoring than pass/fail there is inherent prioritisation /within/ the
>>> guidelines, no need for a separate prioritisation.
>>>
>>> Cheers,
>>>
>>> Alastair
>>>
>>>
>>> Apologies for typos, sent from a mobile.
>>> ------------------------------
>>> *From:* David MacDonald <david@can-adapt.com> <david@can-adapt.com>
>>> *Sent:* Wednesday, August 12, 2020 5:30:48 PM
>>> *To:* Detlev Fischer <detlev.fischer@testkreis.de>
>>> <detlev.fischer@testkreis.de>
>>> *Cc:* WCAG group <w3c-wai-gl@w3.org> <w3c-wai-gl@w3.org>
>>> *Subject:* Re: Alternative Silver / WCAG 3.0 conformance model proposal
>>>
>>> Hi Detlev
>>>
>>> I really like your direction, and it shows a lot of thought and work.
>>>
>>> Cheers,
>>> David MacDonald
>>>
>>>
>>>
>>> *Can**Adapt* *Solutions Inc.*
>>> Mobile:  613.806.9005
>>>
>>> LinkedIn
>>> <http://www.linkedin.com/in/davidmacdonald100>
>>>
>>> twitter.com/davidmacd
>>>
>>> GitHub <https://github.com/DavidMacDonald>
>>>
>>> www.Can-Adapt.com <http://www.can-adapt.com/>
>>>
>>>
>>>
>>> *  Adapting the web to all users*
>>> *            Including those with disabilities*
>>>
>>> If you are not the intended recipient, please review our privacy policy
>>> <http://www.davidmacd.com/disclaimer.html>
>>>
>>>
>>> On Wed, Aug 12, 2020 at 6:41 AM Detlev Fischer <
>>> detlev.fischer@testkreis.de> wrote:
>>>
>>> For those who were not present in yesterday's conformance deep dive, I
>>> just want to put the link to an alternative proposal that I have now
>>> outlined in response to the 2 existing proposals by Rachael and John.
>>>
>>> I fear that in both proposals discussed so far, the scoring approach
>>> will be very complex and hard to understand. I am also uncertain whether
>>> the complete revamp of the structure into guidelines/methods is
>>> justified.
>>>
>>> My proposal tries to envisage WCAG 3.0 more as an extension of WCAG 2.X,
>>> turning pass/fail into a graded scoring scheme with 5 points (what has
>>> been called 'adectival rating'). In that way, it supports the inclusion
>>> of new success criteria that are not easily amenable to a pass/fail
>>> rating. The proposal also allows the definition of paths (based on user
>>> tasks) as an aggregate for scoping conformance claims.
>>>
>>> Here is the link:
>>> https://docs.google.com/presentation/d/1dV1moNnq-56sS1o84UCKkc_g-gE10X6Y/
>>>
>>> This is just a sketch, and hopefully a basis for discussing alternatives
>>> that seem to me more workable than what we have so far.
>>>
>>> Detlev
>>>
>>> --
>>> Detlev Fischer
>>> DIAS GmbH
>>> (Testkreis is now part of DIAS GmbH)
>>>
>>> Mobil +49 (0)157 57 57 57 45
>>>
>>> http://www.dias.de
>>> Beratung, Tests und Schulungen für barrierefreie Websites
>>>
>>>
>>>
>>> --
>>> Detlev Fischer
>>> DIAS GmbH
>>> (Testkreis is now part of DIAS GmbH)
>>>
>>> Mobil +49 (0)157 57 57 57 45
>>> http://www.dias.de
>>> Beratung, Tests und Schulungen für barrierefreie Websites
>>>
>>>
>>
>> --
>> *Wilco Fiers*
>> Axe for Web product owner - Co-facilitator WCAG-ACT - Chair ACT-R
>>
>>
Received on Tuesday, 25 August 2020 08:19:51 UTC