RE: After today's call from Suzanne Taylor on 2021-08-10 (public-silver@w3.org from August 2021)

From: Suzanne Taylor <suzanne.taylor@thingsentertainment.net>
Date: Tue, 10 Aug 2021 18:43:38 +0000
To: John Foliot <john@foliot.ca>, Silver TF <public-silver@w3.org>
Message-ID: <0100017b316093f8-8cdc8f4e-8104-4d49-94cb-a4d2186be7c4-000000@email.amazonses.co>
Although I could not make the meeting, adopting this as the base to refine seems like a no-brainer to me.

 
From: John Foliot <john@foliot.ca> 
Sent: Tuesday, August 10, 2021 1:45 PM
To: Silver TF <public-silver@w3.org>
Subject: Fwd: After today's call

 
Hello all,

 
First, thanks to the chairs for allowing me to present my alternative scoring proposal. As noted on the call, while the PPT deck is available on a Google drive, Google's conversion of that deck to 'Sheets' breaks some of the formatting. If that is an issue for you (or if you are unable to access the Google drive, perhaps due to firewall considerations) please let me know and I would be happy to forward you a copy of the PPT deck if you are interested. 

 
While I did not spend any time focussing on the "callout bubbles" in the deck, each comment comes from the first round of feedback, and is linked in the deck to the Issue in GitHub.

 
Recap of the main ideas:

* Two ways of achieving "points" that work in tandem - unambiguous unit tests, and adoption of protocols.
* Use EARL (mandated) to report adoption of Protocols (the public declaration/public accountability piece). EARL could also be used in reporting the scope (User Generated discussion for example), and because EARL can be outputted in multiple formats, the data could also be exported as JSON fragments, which could be used in dashboards and even (use your imagination :-) ) dynamically generated "scores" (think badging. etc.).


Unit Tests and Points:

* I proposed weighting individual unit tests based on impact across the Functional Categories: my argument being that the more Category groups impacted, the more 'valuable' the unit test outcome becomes. (This is intended to help dev teams focus, not just on low-hanging fruit, but actually more 'critical' outcomes/requirements based on known user-needs - because *that* specific unit test has more 'value'. It also helps address the "Critical Failure" question, as there are truly very few "critical errors", but plenty of 'significant to the point of failure' errors - but often only critical to one of the 14 Functional Category user-groups. If we adopted weighted scores, we might also consider 'adjusting' the scores for some unit tests to make them more 'valuable' - although could also be a slippery slope.)
* I propose using the principles as a means of adding equity to the scoring: There may be more unit tests (by count) under the "Perceivable" category and fewer under "Understandable", but if the final percentile score for each category contributes equally to the final score (i.e. either contributes up-to "20" (%)) then focussing on the Understandable unit tests becomes equally as important as the Perceivable unit tests. 
(This is intended to off-set the complaint that there is, and likely always will-be, fewer unit tests for "Understandable" - which tracks back to COGA concerns with our current system)
* For discussion: do we continue to include the "R" (Robust) Principle, and is that Principle 'as important' as the other 3?

Protocols and Assertions:

* Rather than attempting to measure subjective determinations, we instead reward content owners for PUBLICALLY adopting Protocols related to Usability and Accessibility (e.g. Making content Usable for COGA, WCAG 3 Maturity Model, US Fed Plain Language Guidelines, etc.)
* Protocols come in two 'flavors' (AGWG vetted and weighted) and/or "Custom" (non-AGWG-vetted, BUT MUST BE PUBLICALLY AVAILABLE via a public URL)
* Vetted Protocols are worth more points, as we have the ability to impact what they state, and/or have met our internal review. 
(Looking wayyyy down the line, I could anticipate entities seeking our WG to vet *their* protocol with an eye towards making that protocol more 'valuable'. As a strawman example, Adobe recently published their 'Spectrum' guidance - https://spectrum.adobe.com/page/principles/ <https://spectrum.adobe.com/page/principles/> - and they might seek to have that Protocol evaluated and 'scored' differently by our Working Group. It is my personal opinion that this would be both a good thing, and something our WG could encourage)
* As another strawman, in my proposal I suggest a 'maximum score' of 20 points under the Protocols and Assertions 'column', but that is a TBD (as is/will be assigning value points to Protocols, and we'll likely need to identify a core set of those Protocols to start. I've started a list.)

* Integral to this piece of the proposal is the mandated use of EARL for the public declaration / public accountability reporting.


What I propose to 'drop':

*  attempting to measure "user flows" or "happy paths", as we simply cannot predict that for all users
*  counting instances of failures (i.e 2 of 100 images lacking alt text does not = 98, it equals zero for THAT VIEW)
*  attempting to measure or evaluate usability 
JF

-- 

John Foliot | 
Senior Industry Specialist, Digital Accessibility | 
W3C Accessibility Standards Contributor |

"I made this so long because I did not have time to make it shorter." - Pascal "links go places, buttons do things"
Received on Tuesday, 10 August 2021 18:44:21 UTC