notes for silver conformance models discussion from Bruce Bailey on 2020-03-25 (w3c-wai-gl@w3.org from January to March 2020)

From: Bruce Bailey <Bailey@Access-Board.gov>
Date: Wed, 25 Mar 2020 21:36:51 +0000
To: 'WCAG list' <w3c-wai-gl@w3.org>
Message-ID: <CY4PR22MB04861F645EFB6AB7D9B4E8ACE3CE0@CY4PR22MB0486.namprd22.prod.outlook.com>

Brief description: Multiple Currencies (no exchange rate)
Postulation: Any point-based rating scheme will need two or more categories.
One set of points tracks towards minimum requirements.
Points beyond minimum can be banked towards "achievements" or other recognition of improved accessibility that reflect best practices.
Simplest approach would be two scoring categories: base/core and achievements / best practices.
A more nuanced approach might have separate point totals in 7-10 FPC categories.
Multiple currencies address some obvious problems with single point system.
How to award points for nice-to-have features while making sure basics are not skipped?
How to recognize an author who scores better than 100% in any one GL?
Multiple currencies will also facilitate scoring transition from 2.x to W3CAG.
Old email to list:
http://lists.w3.org/Archives/Public/public-silver/2019Jun/0045.html

Brief description: Adjective Ratings
Problem 1: Manual tally of FICO style rating (up to 1000 points) is not humanly possible
Problem 2: Many GL (e.g. Plain Language, Visual Contrast) do not lend themselves to point assignment based on counting.
Problem 3: Automated testing only can catch ~40% errors.
Problem 4. While automated tests are good for tracking incremental improvements, scores from different (brands of) automated tests not directly comparable.
How scoring is handled? Three are three steps. (Skip Step 2 for single-page apps.)
Step 1: For each unit (e.g., webpage) in the scope of conformance (e.g., website being evaluated), assign adjectival ratings for each GL.
Strawman adjectives: Outstanding, Very Good, Acceptable, Unacceptable, Very Poor
Step 2: Derive a representative rating for each GL in the scope of conformance. This representative rating may be mode, lowest rating of any page, mean (by assigning point values), or some other rubric.
Step 3: Derive an overall adjectival rating for the scope of conformance, using a predetermined rubric. For example:
Outstanding: At least two GL rated as Outstanding. No more than one GL rated as Acceptable, and all other GL rated as Very Good or Outstanding.
Very Good: Half or more GL rated as Very Good or Outstanding. No GL rated as Unacceptable or Very Poor.
Acceptable: All GL rated as Acceptable or better.
Unacceptable: One or more GL rated as Unacceptable or Very Poor.
Very Poor: Two or more GL rated as Very Poor.
Look at adjectival ratings for Headings:
https://docs.google.com/spreadsheets/d/1G0KLv1Nfvy5QWN7t9jPxyE6UEcTHE5A8tKYiDOhuZRY/edit#gid=1833982643
Known gaps:
Step 1, rating a single GL on a single page, somewhat subjective.
Step 2 (rating of GL for whole site) is somewhat arbitrary, and Step 3 even more arbitrary than Step 2.
Thresholds for Steps 2 and 3 will need to be revisited.
Next steps:
Finish clear language write-up.
Draft adjectival rating description for alt text and keyboard accessibility.
Experiment to see if we can get inter-rating reliability on five GL for a few model websites.
Disadvantages: Not granular. Can't rank one Very Good website against another Very Good website.

Received on Wednesday, 25 March 2020 21:37:09 UTC