Re: Alternative Silver / WCAG 3.0 conformance model proposal from jake abma on 2020-08-25 (w3c-wai-gl@w3.org from July to September 2020)

From: jake abma <jake.abma@gmail.com>
Date: Tue, 25 Aug 2020 10:28:38 +0200
To: Katie Haritos-Shea <ryladog@gmail.com>
Cc: Wilco Fiers <wilco.fiers@deque.com>, Detlev Fischer <detlev.fischer@testkreis.de>, Alastair Campbell <acampbell@nomensa.com>, WCAG group <w3c-wai-gl@w3.org>
Message-ID: <CAMpCG4GjaqU1VhdGve_pN=Bh41A-duxo4yPmsXnSevSBPvFwEg@mail.gmail.com>
lots of my concerns as I've mentioned in the previous mail would be
invalidated if we, as Detlev mentioned, build on top of what we have
already and focus on adjectival rating additions so we can more easily
cover COGA criteria (among others)

It's more like a extended A, AA and AAA system BUT with possible 5 steps
AND blended into one and the same guideline grading (not as separate SC)

Op di 25 aug. 2020 om 10:19 schreef jake abma <jake.abma@gmail.com>:

> As I've done lots of testing for Silver with a structure of Guidelines
> containing Functional Outcomes and see how to score Methods I  basically
> come to the same conclusion of Detlev.
> The testing also had more thorough adjectival rating scoring than I've
> seen so far at the other proposals and the conclusions Detlev draws are
> very similar.
>
> draft examples of the adjectival rating (just to give an idea):
>
>
> https://docs.google.com/spreadsheets/d/1iCJfyMtcsSq7GHmwnc4aTNguadRfGDa0H8FBZMaJpcQ/edit#gid=633158340
>
> https://docs.google.com/spreadsheets/d/1yTpbJFsaadIbCigV15YK6bWB4xk_MoOgREjGMHHMVgQ/edit#gid=1217251850
>
> People not present at the Silver calls probably have not seen the files so
> I'll add the Silver directory with the work:
>
> https://drive.google.com/drive/folders/1WYA8DH3uLLJaSmwsXM0wvNm1oDG9FVQ0
>
> A BIG warning that they need some explanation and are drafty but you can
> see the work done.
> This is to give an idea of diving into the deep of the proposals and
> suggest to not try to understand all info in the spreadsheets but to see
> all the work in the different tabs in the spreadsheets.
> Hereby some deeplinks to tabs in spreadsheets to give the idea:
>
>
> https://docs.google.com/spreadsheets/d/1yTpbJFsaadIbCigV15YK6bWB4xk_MoOgREjGMHHMVgQ/edit#gid=1290202920
>
> https://docs.google.com/spreadsheets/d/1yTpbJFsaadIbCigV15YK6bWB4xk_MoOgREjGMHHMVgQ/edit#gid=1217251850
>
>
> https://docs.google.com/spreadsheets/d/1iCJfyMtcsSq7GHmwnc4aTNguadRfGDa0H8FBZMaJpcQ/edit#gid=2091469352
>
> https://docs.google.com/spreadsheets/d/12qVlCHZDvCVcLSd0Ez2LjTW_2sJxKkK76KfpUKNYoug/edit#gid=1983238719
>
> I have also send my thoughts and 2 cents before my vacation in a mail as I
> didn't have time to present a conformance proposal, and this might be
> interesting to read as it mentioned some concerns I have:
>
> -----------------------------
>
> Hi Chuck / all,
>
>
>
> Thank you so much for the opportunity!
>
> The timing for me is not good though, and sadly I have to tell you that
> summer vacation will start next week and I will be occupied by two
> functional outcomes running around in my house.
>
>
>
> To give my 2 cents on the request I do want to mention the following:
>
>
>
> I honestly think we are not ready to decide on a scoring system or
> conformance model and really wonder why or how we can even think of one if
> we do not have the framework and examples to base it on. The last weeks
> I've done lots of experiments and the conclusion is very, very clear: we
> need a clear structure (framework) and proper examples of some artifacts
> we've just started to invent. Scoring / testing on a atomic level is not
> our challenge as we know how to do that, but from within the framework
> where we have all kind of relations and placing them in tasks or paths is
> where the challenges begin.
>
>
>
> Just some facts:
>
>    - We recently 'decided' to go for 'paths' instead of tasks, what a
>    path is is not clear yet and we do not have 1 example (will it work at all?)
>    - We wanted to use tasks, we do not have one clear task worked out and
>    have no idea if it works
>    - We only recently have a master list of functional needs, a / the
>    user needs based on that list does not exist (yet)
>    - Grouping these user needs and see which and where functional
>    outcomes pop up is not something we have
>    - Grouping functional outcomes to see what kind of guidelines emerge
>    is still far away
>    - Functional Outcomes will be normative and must be scored / tested
>    (NOT the methods)
>    - Functional outcomes can have 1 or more methods to fulfil the
>    outcome, this conflicts and need normalization or a layered approach
>    - More than one functional outcome can be part of a guideline, also
>    needs normalization
>    - The tests are not created as we do not have a list of functional
>    outcomes in a framework
>    - We do not have granular tests by far who work fine at the moment,
>    only abstract ideas on a holistic level, not on the detail level where the
>    hard work needs to be done
>
> Some other facts
>
>    - Our own approach as documented should be:
>
>
> https://docs.google.com/document/d/1gfYAiV2Z-FA_kEHYlLV32J8ClNEGPxRgSIohu3gUHEA/edit#heading=h.s6cmfinlgb3q
> <https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.google.com%2Fdocument%2Fd%2F1gfYAiV2Z-FA_kEHYlLV32J8ClNEGPxRgSIohu3gUHEA%2Fedit%23heading%3Dh.s6cmfinlgb3q&data=02%7C01%7Cjake.abma%40ing.com%7C23598cb7738e493efc5c08d82dcd4c3e%7C587b6ea13db94fe1a9d785d4c64ce5cc%7C0%7C0%7C637309707053311384&sdata=xyA3A9eCh6kGtN2HIMrK0IcQZ8o8GVUaSR27vTEwCqk%3D&reserved=0>
>
>    - Part 1 - Define User Need
>       - Part 2 - Write Functional Outcomes
>       - Part 3 - Develop Tests
>       - Part 4 - Write Methods
>       - ....
>       - ....
>       - Part 6 - Write the Guideline
>       - Part 8 - Evaluate
>       - Part 10 - Conformance???
>
>
>    - This is not what I see happening and we try to work the other way
>    around
>
>
>    - Seems like we start with a guideline and work our way backwards to
>       fill in the gaps, this worked counterproductive I think as I've shown for
>       "headings"
>
>
>    - My suggestion would be to have consensus on each step first and see
>    if it is solid enough to go to the next step
>
> Have lots more to mention but my summary would be we must need the
> following before even thinking about conformance:
>
>    - A clear list of functional outcomes (not complete but relevant
>    enough + consensus) based on the functional needs (NO guidelines yet!)
>    - Tests for the outcomes (technology agnostic on the normative level
>    BEFORE methods and how to test methods technology specific + consensus)
>    - Possible guidelines all the outcomes belong to with conflicting
>    methods (guidelines with varied functional outcomes and  layered  methods)
>    - Clear elaborated paths / tasks (minimal 2), and all possible
>    guidelines and see if we can create boundaries for such an approach
>    - Adjectival rating examples who work and are solid (we do not have
>    them, the ones present do not work yet as proposed + consensus)
>    - Only if we worked out these relations we can start thinking about a
>    good conformance model.
>
> Again, the key lies within the functional outcomes list, how to test them
> (test methods) AND normalize them from within a possible guideline /
> between guidelines WITHOUT having (technology specific) methods.
>
>
>
> I just can't see and don't see how and why we must come up with a
> conformance model and scoring system if we do not have the proper examples
> yet on the other ingredients.
>
>
>
> Cheers,
>
> Jake
>
> -----------------------------------
>
> So far an update from my side, wondering what has been decided the last 3
> weeks...
>
> Cheers,
> Jake
>
>
>
> Op do 13 aug. 2020 om 21:17 schreef Katie Haritos-Shea <ryladog@gmail.com
> >:
>
>> Thanks Detlev,
>>
>> I find this very interesting and as another viable option here
>>
>> ** katie **
>>
>> *Katie Haritos-Shea*
>> *Principal ICT Accessibility Architect*
>>
>>
>> *Senior Product Manager/Compliance/Accessibility **SME*
>> *, **Core Merchant Framework UX, Clover*
>>
>>
>> *W3C Advisory Committee Member and Representative for Knowbility *
>>
>>
>> *WCAG/Section 508/ADA/AODA/QA/FinServ/FinTech/Privacy,* *IAAP CPACC+WAS
>> = **CPWA* <http://www.accessibilityassociation.org/cpwacertificants>
>>
>> *Cell: **703-371-5545 <703-371-5545>** |* *ryladog@gmail.com
>> <ryladog@gmail.com>* *| **Seneca, SC **|* *LinkedIn Profile
>> <http://www.linkedin.com/in/katieharitosshea/>*
>>
>> People may forget exactly what it was that you said or did, but they will
>> never forget how you made them feel.......
>>
>> Our scars remind us of where we have been........they do not have to
>> dictate where we are going.
>>
>>
>>
>>
>>
>>
>> On Thu, Aug 13, 2020 at 1:09 PM Wilco Fiers <wilco.fiers@deque.com>
>> wrote:
>>
>>> Hey folks,
>>>
>>> I've been thinking about this a bunch, and I haven't yet come up with
>>> something that doesn't cause a heap of problems. I did want to share my
>>> thoughts on this so far, as I hope it might trigger some ideas on solutions.
>>>
>>> My biggest question I have at the moment is can we come up with a
>>> grading mechanism that is reasonable, that can be done consistently,
>>> without significantly increasing the effort it would take to do a test.
>>>
>>>
>>> At the foundation of that is that we need some sort of mechanism by
>>> which we count the number of fails, or passes, or both. It may not seem too
>>> difficult for us to count for example the number of images on the page, but
>>> the more you think about this, the harder it gets. For WCAG 2, it really
>>> doesn't matter if an image map is one image or one plus however many areas
>>> there are. But if having 1 problematic image gets you "good" in WCAG 3, and
>>> having 5 gets you "reasonable", suddenly being able to do that starts to
>>> matter a whole lot.
>>>
>>> Scoring mechanisms based on numbers of issues makes WCAG testing far
>>> more difficult then it is today. Throwing tools at it might mitigate that
>>> problem, but it creates another one. Tools have a very different
>>> perspective then humans do. If we write this to work well for tools, it
>>> might not be viable for humans to do that same evaluation, and vice versa.
>>> Even between tools perspectives differ substantially. A tool with access to
>>> the web extension APIs will find certain things very easy that are nearly
>>> impossible for other tools (there are other APIs like that too). What do we
>>> do there? Force all tools toward one particular browser that happens to
>>> have the APIs we like best? Lowest common denominator? What about non-HTML
>>> tools?
>>>
>>>
>>> The most promising idea I've come up with is to instead of having WCAG
>>> define the unit of counting, have that be based on scope. Your conformance
>>> claim can say something like, this page consists of three sections; header,
>>> main, footer. The header and main score good, the footer scores poor for
>>> screen reader users, but good for all other user groups. Or you can break
>>> that same thing down into paths where some paths score higher than others.
>>>
>>> That avoids the problem of counting... sort of. This won't work well if
>>> we break scope down to the element level. Also if someone insists on a pass
>>> or fail at a page level, there it still comes to a complete fail, even for
>>> a minor issue.
>>>
>>>
>>> Wilco
>>>
>>> On Thu, Aug 13, 2020 at 5:43 PM Detlev Fischer <
>>> detlev.fischer@testkreis.de> wrote:
>>>
>>>> Hi Alastair,
>>>>
>>>> I think I would still prefer to rate SCs (or functional outcomes / FO
>>>> in WCAG 3.0) on the basis of what the best and worst possible
>>>> implementation is for that particular FO, and use those to define the end
>>>> points of the 5 point scale rather that using that scale to also reflect
>>>> priorisation *across* FOs. I think that should be done on another level. It
>>>> may well be that priorisation (or relative weight) of FOs depends on
>>>> contexts and may be defined differently for different types of
>>>> applications, for example.
>>>>
>>>> In my view, what is urgently needed for the scoring model is a separate
>>>> 'stop condition' for flagging critical issues. To take an example: SC 3.1.2
>>>> "Language of Parts" (or a related FO) might fail on an information-oriented
>>>> site with some publication titles without lang markup (0/5) but that would
>>>> not be critical for such a site. On the other hand, if you are testing an
>>>> online translation service, failure of 3.1.2 (0/5) should be flagged as
>>>> critical, because it is. So, if you have a sample of such an application
>>>> containing the view with the main translation function and besides that, 5
>>>> other views like FAQ, imprint, version history etc., a 5/5 rating on those
>>>> other views (for lack of foreign language content) must not bury the
>>>> critical issue by raising the overall score of 3.1.2 in aggregation across
>>>> views sampled. Or reporting may present the aggregate score, but clearly
>>>> flag the failure. Flagged failures could be used prevent overall
>>>> conformance of the chosen scope if we accept the condition "no critical
>>>> failures in any functional outcome across the sampled views" - but that
>>>> needs to be debated, of course.
>>>>
>>>> Best,
>>>> Detlev
>>>>
>>>> Am 13.08.2020 um 15:25 schrieb Alastair Campbell:
>>>>
>>>> Thanks for that Detlev,
>>>>
>>>> I'm not sure if your proposal would actually be simpler overall
>>>> (compared to where the others might get to), but I really like some aspects
>>>> like the "Adjectival or 5 point scoring" slide. That explains something
>>>> where there are many options in a very straightforward way.
>>>>
>>>> It occurs to me that some prioritisation could be built into that with
>>>> much less controversy.
>>>>
>>>> For example, having flashes on the view scores 0/5. If we decided that
>>>> "language of the page" (environment, whatever) was less of an issue, it
>>>> could score 2/5. (Or 3 if the user group is known and the default works?)
>>>>
>>>> That's a hypothetical, just pointing out that when you have finer
>>>> gained scoring than pass/fail there is inherent prioritisation /within/ the
>>>> guidelines, no need for a separate prioritisation.
>>>>
>>>> Cheers,
>>>>
>>>> Alastair
>>>>
>>>>
>>>> Apologies for typos, sent from a mobile.
>>>> ------------------------------
>>>> *From:* David MacDonald <david@can-adapt.com> <david@can-adapt.com>
>>>> *Sent:* Wednesday, August 12, 2020 5:30:48 PM
>>>> *To:* Detlev Fischer <detlev.fischer@testkreis.de>
>>>> <detlev.fischer@testkreis.de>
>>>> *Cc:* WCAG group <w3c-wai-gl@w3.org> <w3c-wai-gl@w3.org>
>>>> *Subject:* Re: Alternative Silver / WCAG 3.0 conformance model proposal
>>>>
>>>> Hi Detlev
>>>>
>>>> I really like your direction, and it shows a lot of thought and work.
>>>>
>>>> Cheers,
>>>> David MacDonald
>>>>
>>>>
>>>>
>>>> *Can**Adapt* *Solutions Inc.*
>>>> Mobile:  613.806.9005
>>>>
>>>> LinkedIn
>>>> <http://www.linkedin.com/in/davidmacdonald100>
>>>>
>>>> twitter.com/davidmacd
>>>>
>>>> GitHub <https://github.com/DavidMacDonald>
>>>>
>>>> www.Can-Adapt.com <http://www.can-adapt.com/>
>>>>
>>>>
>>>>
>>>> *  Adapting the web to all users*
>>>> *            Including those with disabilities*
>>>>
>>>> If you are not the intended recipient, please review our privacy policy
>>>> <http://www.davidmacd.com/disclaimer.html>
>>>>
>>>>
>>>> On Wed, Aug 12, 2020 at 6:41 AM Detlev Fischer <
>>>> detlev.fischer@testkreis.de> wrote:
>>>>
>>>> For those who were not present in yesterday's conformance deep dive, I
>>>> just want to put the link to an alternative proposal that I have now
>>>> outlined in response to the 2 existing proposals by Rachael and John.
>>>>
>>>> I fear that in both proposals discussed so far, the scoring approach
>>>> will be very complex and hard to understand. I am also uncertain
>>>> whether
>>>> the complete revamp of the structure into guidelines/methods is
>>>> justified.
>>>>
>>>> My proposal tries to envisage WCAG 3.0 more as an extension of WCAG
>>>> 2.X,
>>>> turning pass/fail into a graded scoring scheme with 5 points (what has
>>>> been called 'adectival rating'). In that way, it supports the inclusion
>>>> of new success criteria that are not easily amenable to a pass/fail
>>>> rating. The proposal also allows the definition of paths (based on user
>>>> tasks) as an aggregate for scoping conformance claims.
>>>>
>>>> Here is the link:
>>>>
>>>> https://docs.google.com/presentation/d/1dV1moNnq-56sS1o84UCKkc_g-gE10X6Y/
>>>>
>>>> This is just a sketch, and hopefully a basis for discussing
>>>> alternatives
>>>> that seem to me more workable than what we have so far.
>>>>
>>>> Detlev
>>>>
>>>> --
>>>> Detlev Fischer
>>>> DIAS GmbH
>>>> (Testkreis is now part of DIAS GmbH)
>>>>
>>>> Mobil +49 (0)157 57 57 57 45
>>>>
>>>> http://www.dias.de
>>>> Beratung, Tests und Schulungen für barrierefreie Websites
>>>>
>>>>
>>>>
>>>> --
>>>> Detlev Fischer
>>>> DIAS GmbH
>>>> (Testkreis is now part of DIAS GmbH)
>>>>
>>>> Mobil +49 (0)157 57 57 57 45
>>>> http://www.dias.de
>>>> Beratung, Tests und Schulungen für barrierefreie Websites
>>>>
>>>>
>>>
>>> --
>>> *Wilco Fiers*
>>> Axe for Web product owner - Co-facilitator WCAG-ACT - Chair ACT-R
>>>
>>>
Received on Tuesday, 25 August 2020 08:29:03 UTC