Re: notes for silver conformance models discussion from David MacDonald on 2020-03-27 (w3c-wai-gl@w3.org from January to March 2020)

From: David MacDonald <david100@sympatico.ca>
Date: Fri, 27 Mar 2020 11:20:14 -0400
To: Bruce Bailey <Bailey@access-board.gov>
Cc: WCAG list <w3c-wai-gl@w3.org>
Message-ID: <CAAdDpDbDnSG0Vc81cUbrNZG-n8BPvebzK8Y3f=LTdDVy_X47_Q@mail.gmail.com>
Hi All

I'd like to make a couple of comments on these two proposals.

=== Multiple Currencies (no exchange rate ) ===

In the proposal there would be one measurement for the baseline
requirements and another for going above and beyond. While I think that
there is a compelling case for this, experience in the field seems to show
that most organizations will not do the "above and beyond" stuff, unless it
is either mandated or part of the baseline.

I think that in a certain sense we already have a system of multiple
currencies. Most jurisdictions require AA conformance in law or policy,
while AAA is optional. I don't know any organizations (that are not in the
accessibility business) that voluntarily do AAA criteria.
So the second currency of "above and beyond" is not likely to get
implemented in most organizations and will likely be ignored.

On the other hand there are many best practices which I've seen mandated in
law suits (adjacent links to same location, properly nested headings,
warning before opening a new window, etc.) which are not *required* by WCAG
(only suggested by WCAG) but they get done because people "think" they are
required by WCAG.

==Adjectives==

I'm attracted to the idea of adjectives because:

   - Scores can be inaccurate and can be arbitrary, but they give
   an illusion of precision (73%). Lately I've been on a project
   evaluating commercial crawlers and they usually have some sore of scoring
   metric. These are useful for showing progress over time, but rarely
   accurately reflect the accessibility of the page evaluated. There are very
   smart people working on those algorithms.
   - Scores are hard to calculate
   - Scores can be hard to teach
   - Score calculation methodologies can be hard to understand
   - Scores require counting all the pass instances which doubles or
   triples the evaluation effort for auditors
   - adjectives have precedence and are used in scoring many formal metrics
   in society. Some universities (A-F) each has an adjective. Performance
   appraisals, etc...


Cheers,
David MacDonald



*Can**Adapt* *Solutions Inc.*

Tel:  613-806-9005

LinkedIn
<http://www.linkedin.com/in/davidmacdonald100>

twitter.com/davidmacd

GitHub <https://github.com/DavidMacDonald>

www.Can-Adapt.com <http://www.can-adapt.com/>



*  Adapting the web to all users*
*            Including those with disabilities*

If you are not the intended recipient, please review our privacy policy
<http://www.davidmacd.com/disclaimer.html>


On Wed, Mar 25, 2020 at 5:36 PM Bruce Bailey <Bailey@access-board.gov>
wrote:

> Brief description:  Multiple Currencies (no exchange rate)
>
> Postulation:  Any point-based rating scheme will need two or more
> categories.
>
> One set of points tracks towards minimum requirements.
>
> Points beyond minimum can be banked towards “achievements” or other
> recognition of improved accessibility that reflect best practices.
>
> Simplest approach would be two scoring categories:  base/core and
> achievements / best practices.
>
> A more nuanced approach might have separate point totals in 7-10 FPC
> categories.
>
> Multiple currencies address some obvious problems with single point system.
>
> How to award points for nice-to-have features while making sure basics are
> not skipped?
>
> How to recognize an author who scores better than 100% in any one GL?
>
> Multiple currencies will also facilitate scoring transition from 2.x to
> W3CAG.
>
> Old email to list:
>
> http://lists.w3.org/Archives/Public/public-silver/2019Jun/0045.html
>
>
>
>
>
> Brief description:  Adjective Ratings
>
> Problem 1:  Manual tally of FICO style rating (up to 1000 points) is not
> humanly possible
>
> Problem 2:  Many GL (e.g. Plain Language, Visual Contrast) do not lend
> themselves to point assignment based on counting.
>
> Problem 3:  Automated testing only can catch ~40% errors.
>
> Problem 4.  While automated tests are good for tracking incremental
> improvements, scores from different (brands of) automated tests not
> directly comparable.
>
> How scoring is handled?   Three are three steps.  (Skip Step 2 for
> single-page apps.)
>
> Step 1:  For each unit (e.g., webpage) in the scope of conformance (e.g.,
> website being evaluated), assign adjectival ratings for each GL.
>
> Strawman adjectives:  Outstanding, Very Good, Acceptable, Unacceptable,
> Very Poor
>
> Step 2:  Derive a representative rating for each GL in the scope of
> conformance.  This representative rating may be mode, lowest rating of any
> page, mean (by assigning point values), or some other rubric.
>
> Step 3:  Derive an overall adjectival rating for the scope of conformance,
> using a predetermined rubric.  For example:
>
> Outstanding:  At least two GL rated as Outstanding.  No more than one GL
> rated as Acceptable, and all other GL rated as Very Good or Outstanding.
>
> Very Good:  Half or more GL rated as Very Good or Outstanding.  No GL
> rated as Unacceptable or Very Poor.
>
> Acceptable:  All GL rated as Acceptable or better.
>
> Unacceptable:  One or more GL rated as Unacceptable or Very Poor.
>
> Very Poor:  Two or more GL rated as Very Poor.
>
> Look at adjectival ratings for Headings:
>
>
> https://docs.google.com/spreadsheets/d/1G0KLv1Nfvy5QWN7t9jPxyE6UEcTHE5A8tKYiDOhuZRY/edit#gid=1833982643
>
> Known gaps:
>
> Step 1, rating a single GL on a single page, somewhat subjective.
>
> Step 2 (rating of GL for whole site) is somewhat arbitrary, and Step 3
> even more arbitrary than Step 2.
>
> Thresholds for Steps 2 and 3 will need to be revisited.
>
> Next steps:
>
> Finish clear language write-up.
>
> Draft adjectival rating description for alt text and keyboard
> accessibility.
>
> Experiment to see if we can get inter-rating reliability on five GL for a
> few model websites.
>
> Disadvantages:  Not granular.  Can’t rank one Very Good website against
> another Very Good website.
>
>
>
>
>
>
>
Received on Friday, 27 March 2020 15:20:42 UTC