OBJECTION (was: Agenda for Silver meeting of Friday 22 April 2022) from John Foliot on 2022-04-21 (public-silver@w3.org from April 2022)

From: John Foliot <john@foliot.ca>
Date: Thu, 21 Apr 2022 14:23:03 -0400
To: John Foliot <john@foliot.ca>
Cc: Shawn Lauriat <lauriat@google.com>, Silver TF <public-silver@w3.org>
Message-ID: <CAFmg2sXsmdv2Wi1gEEHLGc-zi_4CBxh1Ezk2eg5wj0BxToNe4g@mail.gmail.com>
As I remain unable to join most AGWG calls in person, I would like to
register my *Strong Objection* to the proposed edits at:
https://github.com/w3c/silver/pull/624/files

Specifically, I take issue with the following: (comments inline)

WCAG 3.0 includes four types of tests which are evaluated *pass/fail*:

   - *Objective tests:* Tests where results will not vary based on the
   tester or approach. Examples include testing whether something exists or
   against a constant value.

   - Conditional tests: Tests that rely on a *subjective evaluation* based
   on existing criteria. Test results may vary slightly between testers who
   understand the criteria. Examples include testing quality and applicability.

From a regulatory and conformance perspective, how can subjective
determinations be accurately and consistently measured as Pass or Fail? If
I say Fail and you say Pass, who is right and why?

This also ignores previous feedback we have already received
<https://github.com/w3c/silver/issues/417> from Industry participants that
stated: "

*the scoring system should be more objective and data-driven, and avoid
adjectival ratings which potentially introduce ambiguity and whose meaning
may change when translated into other languages."*

   - Convention tests: Objective and conditional tests where the results
   rely on the content being tested. Examples include consistency and
   convention tests.

While I can certainly appreciate the intention here, do we have ANY
examples of conditional tests related to consistency or conventions?

Before this group commits to proposed convention tests, can we see an
example of what that means in practice? The proposed edit provides the
following definition for Convention Tests: "*Convention tests evaluate
against a baseline that the creating organization sets, often by
documenting design decisions and conventions used throughout the content.
Convention tests include both objective and conditional tests but the
criteria used to evaluate the results differ based on the aggregate being
tested. Results within the same Examples include testing for consistency of
design decisions and content choices.*"

As I read this, my understanding is that individual entities can define
their own conventions (based upon what criteria?). It is unclear how this
will advance accessibility however, because if we do not have a say in the
"convention(s)", entities could create a convention that frustrates rather
than improves the user-experience for some PwD. "*It is the convention of
the XYZ Widget Company that all footer text be rendered at 7pts.*" helps
no-one, yet according to the current draft would be completely acceptable.

I strongly object to entities writing their own conformance rules - there
is zero evidence that this will work in practice, and one of our own design
principles <https://www.w3.org/TR/wcag-3.0-explainer/#ContentGoals> is that
decisions would be "*data-informed and evidence-based*". Where is the data
and evidence driving this decision (please)?


   - Procedure tests: Tests that evaluate whether an accepted process was
   used to improve accessibility. Examples include usability and plain
   language testing.

I find this entry particularly confusing, as we currently have a sub-team
(Protocols) which is looking at "non-testable" requirements that would be
candidates for Protocols, and by direction from the chairs, we are using
Plain Language as one of our examples.

This is another example of an aspirational statement/goal with no evidence
that it can work in practice, and the fact that our group is using Plain
Language as both a requirement that *cannot *resolve to Pass/Fail (in the
context of the Protocols discussions) and then again here as a requirement
that *can *meet Pass or Fail is *contradictory on both the surface and in
substance*.

From a subjective perspective, if I say I applied the principles of Plain
language on my site content, how can a 3rd party entity confirm or refute
that claim? If I claim that I've "Kept It Conversational  / Use(d) examples
<https://www.plainlanguage.gov/guidelines/conversational/use-examples/>"
how can anyone argue against that or prove me wrong? (On a pragmatic level,
how many examples are required before it is rendered True versus False? 1?
5 or more? Something else again? Is there such a thing as too many examples
<https://www.plainlanguage.gov/guidelines/concise/write-short-sections/> -
and who determines what "concise" means in context? Do "examples" need to
be present on each View evaluated, or is it sufficient to have one page
with "examples"? Why or why not?)

Additionally, "usability" as a Pass/Fail? How will this be determined, and
by whom? "Usable" to whom?

There are simply far too many unanswered questions here that making this
statement, even in our working draft, is too soon, and unevaluated for use.

******************************
The draft also continues:

Tests can be applied to four different scopes:


   - *Item*: A component or unit of content. Examples include a drop down
      menu, a media player, a phrase, or an image.
      - *Views*: All content visually and programatically available without
      a substantive change.
      - *User Process*: Series of user actions, and the distinct
      interactive views that support the actions, where each action is required
      in order to complete an activity.
      - *Aggregate*: Combination of all related items, user processes, and
      views.

I have concerns with this as well, as all of these types of tests have
value, but should they all contribute to a final "score"? Who determines
which of the 4 different scopes are applicable for a screen or content? How
and why (and when)? How is that reported for 3rd-party verification? In
practice is it one of the 4 options above, or is it realistically always
"Aggregate"? (Why or why not?)

In particular, I struggle with "User Process" because while entities *MAY*
be able to define a "happy path" for specific functions or features, actual
user testing confirms that not all users will take the "happy path"
envisioned by the UX designer, and on many screens, the user frequently has
multiple "happy paths" to choose from - so which are in scope and not in
scope? (why or why not?) What, for example, is the user process or happy
path for this screen/view: https://www.w3.org/WAI/? (This is a serious
question)

If a 'happy path' also provides contextual help along the way (contextual,
in that it is 'optional' - i.e. the kind of content we frequently see in
modals or hidden behind 'help' icons such as the italicised i in a circle)
- is each of those 'forks' in the happy path (some users will read one or
more of those contextual helps, another user will ignore them all = 2 or
more paths), does each variant of the path need to be evaluated? Why or why
not? Who determines that? How are 'happy path' (user-flows) documented to
support repeatability in testing? What kind of Pass/Fail hardware testing
configurations are required on these 'user processes'? All AT
configurations? Some? (Which and why/how? Who determines? How is that
documented?) What about when AT is not involved (i.e. cognitive issues)?
This is very long on aspiration, and very short on details.

Is the expectation here similar to Convention Tests, where the entity is
writing *their *own tests against *their *own defined paths? If so, where
is this documented? What assurances do we have that "user process" tests
are *complete *and *inclusive *if the entity is writing their own tests?

I do note that for Procedural Tests, the proposed draft also states "*The
requirements for what would be evaluated for procedure tests are still to
be determined.*"  However, until we have an example of a workable
Procedural test to examine, I will suggest that adding this to our draft is
not in keeping with "...decisions (being) "*data-informed and
evidence-based*"."

For these reasons I strongly object to these edits. I recognize that the
current draft is no better off, but perpetuating problematic content does
not serve anyone well, and these edits do not address many of the
underlying concerns that have been identified previously via industry
feedback (and in some instances continue to perpetuate the same issues):

   - https://github.com/w3c/silver/labels/section:%20conformance
   - https://github.com/w3c/silver/labels/section:%20scoring


Respectfully,

JF
--
*John Foliot* |
Senior Industry Specialist, Digital Accessibility |
W3C Accessibility Standards Contributor |

"I made this so long because I did not have time to make it shorter." -
Pascal "links go places, buttons do things"
Received on Thursday, 21 April 2022 18:23:33 UTC