Fwd: After today's call from John Foliot on 2021-08-10 (public-silver@w3.org from August 2021)

From: John Foliot <john@foliot.ca>
Date: Tue, 10 Aug 2021 13:45:08 -0400
To: Silver TF <public-silver@w3.org>
Message-ID: <CAFmg2sUPLt+rM6+juLURQtk0g8=--HMQeEHm2w2BNfFJB=DmbA@mail.gmail.com>
Hello all,

First, thanks to the chairs for allowing me to present my
alternative scoring proposal. As noted on the call, while the PPT deck is
available on a Google drive, Google's conversion of that deck to 'Sheets'
breaks some of the formatting. If that is an issue for you (or if you are
unable to access the Google drive, perhaps due to firewall considerations)
please let me know and I would be happy to forward you a copy of the PPT
deck if you are interested.

While I did not spend any time focussing on the "callout bubbles" in the
deck, each comment comes from the first round of feedback, and is linked in
the deck to the Issue in GitHub.

*Recap of the main ideas:*

   - Two ways of achieving "points" that work in tandem - unambiguous unit
   tests, and adoption of protocols.

   - Use EARL (mandated) to report adoption of Protocols (the public
   declaration/public accountability piece). EARL could also be used in
   reporting the scope (User Generated discussion for example), and because
   EARL can be outputted in multiple formats, the data could also be exported
   as JSON fragments, which could be used in dashboards and even (use your
   imagination :-) ) dynamically generated "scores" (think badging. etc.).


*Unit Tests and Points:*

   - I proposed *weighting individual unit tests* based on impact across
   the Functional Categories: my argument being that the more Category groups
   impacted, the more 'valuable' the unit test outcome becomes. (This is
   intended to help dev teams focus, not just on low-hanging fruit, but
   actually more 'critical' outcomes/requirements based on known user-needs -
   because *that* specific unit test has more 'value'. It also helps address
   the "Critical Failure" question, as there are truly very few "critical
   errors", but plenty of 'significant to the point of failure' errors - but
   often only critical to one of the 14 Functional Category user-groups. If we
   adopted weighted scores, we might also consider 'adjusting' the scores for
   some unit tests to make them more 'valuable' - although could also be a
   slippery slope.)

   - I propose using the *principles as a means of adding equity to the
   scoring*: There may be more unit tests (by count) under the
   "Perceivable" category and fewer under "Understandable", but if the final
   percentile score for each category contributes equally to the final score
   (i.e. either contributes up-to "20" (%)) then focussing on the
   Understandable unit tests becomes equally as important as the Perceivable
   unit tests.
   (This is intended to off-set the complaint that there is, and likely
   always will-be, fewer unit tests for "Understandable" - which tracks back
   to COGA concerns with our current system)

   - *For discussion*: do we continue to include the "R" (Robust)
   Principle, and is that Principle 'as important' as the other 3?

*Protocols and Assertions:*

   - Rather than attempting to measure subjective determinations, we
   instead reward content owners for PUBLICALLY adopting Protocols related to
   Usability and Accessibility (e.g. Making content Usable for COGA, WCAG 3
   Maturity Model, US Fed Plain Language Guidelines, etc.)

   - Protocols come in two 'flavors' (AGWG vetted and weighted) and/or
   "Custom" (non-AGWG-vetted, BUT MUST BE PUBLICALLY AVAILABLE via a public
   URL)

   - Vetted Protocols are worth more points, as we have the ability to
   impact what they state, and/or have met our internal review.
   (Looking wayyyy down the line, I could anticipate entities seeking our
   WG to vet *their* protocol with an eye towards making that protocol more
   'valuable'. As a strawman example, Adobe recently published their
   'Spectrum' guidance - https://spectrum.adobe.com/page/principles/ - and
   they *might *seek to have that Protocol evaluated and 'scored'
   differently by our Working Group. It is my personal opinion that this would
   be both a good thing, and something our WG could encourage)

   - As another strawman, in my proposal I suggest a 'maximum score' of 20
   points under the Protocols and Assertions 'column', but that is a TBD (as
   is/will be assigning value points to Protocols, and we'll likely need to
   identify a core set of those Protocols to start. I've started a list.)

   - Integral to this piece of the proposal is the mandated use of EARL for
   the public declaration / public accountability reporting.


*What I propose to 'drop':*

   -  attempting to measure "user flows" or "happy paths", as we simply
   cannot predict that for all users
   -  counting instances of failures (i.e 2 of 100 images lacking alt text
   does not = 98, it equals zero for THAT VIEW)
   -  attempting to measure or evaluate usability

JF
-- 
*John Foliot* |
Senior Industry Specialist, Digital Accessibility |
W3C Accessibility Standards Contributor |

"I made this so long because I did not have time to make it shorter." -
Pascal "links go places, buttons do things"
Received on Tuesday, 10 August 2021 17:52:02 UTC