Re: OBJECTION (was: Agenda for Silver meeting of Friday 22 April 2022) from Gregg Vanderheiden on 2022-04-23 (public-silver@w3.org from April 2022)

From: Gregg Vanderheiden <gregg@vanderheiden.us>
Date: Sat, 23 Apr 2022 10:58:48 -0700
To: John Foliot <john@foliot.ca>
Cc: Shawn Lauriat <lauriat@google.com>, Silver TF <public-silver@w3.org>
Message-Id: <DBE470D6-27D6-4A07-8949-D1B4723C559A@vanderheiden.us>
I would add my voice to not confusing testable by defining types of ’tests’ that are not objective.   

If it is subjective in any way - it is an evaluative comment - but not a test.

Only objective things should be included under the term test.      Having subjective  Pass/fail may be ok in a classroom - but if you do that it cannot be adopted in any regulation.  And mixing non-testable with testable  only means that the whole mix cannot be adopted. 

There are other ways for us to reach our objective - getting more cognitive, language, and learning disabilities guidance to developers - but redefining testable as subjective is not the way to go.


Gregg

———————————
Gregg Vanderheiden
gregg@vanderheiden.us



> On Apr 21, 2022, at 11:23 AM, John Foliot <john@foliot.ca> wrote:
> 
> As I remain unable to join most AGWG calls in person, I would like to register my Strong Objection to the proposed edits at:  https://github.com/w3c/silver/pull/624/files <https://github.com/w3c/silver/pull/624/files> 
> 
> Specifically, I take issue with the following: (comments inline)
> 
> WCAG 3.0 includes four types of tests which are evaluated pass/fail:
> Objective tests: Tests where results will not vary based on the tester or approach. Examples include testing whether something exists or against a constant value. 
> 
> Conditional tests: Tests that rely on a subjective evaluation based on existing criteria. Test results may vary slightly between testers who understand the criteria. Examples include testing quality and applicability.
> From a regulatory and conformance perspective, how can subjective determinations be accurately and consistently measured as Pass or Fail? If I say Fail and you say Pass, who is right and why? 
> 
> This also ignores previous feedback we have already received <https://github.com/w3c/silver/issues/417> from Industry participants that stated: "the scoring system should be more objective and data-driven, and avoid adjectival ratings which potentially introduce ambiguity and whose meaning may change when translated into other languages."
> 
> Convention tests: Objective and conditional tests where the results rely on the content being tested. Examples include consistency and convention tests.
> While I can certainly appreciate the intention here, do we have ANY examples of conditional tests related to consistency or conventions? 
> 
> Before this group commits to proposed convention tests, can we see an example of what that means in practice? The proposed edit provides the following definition for Convention Tests: "Convention tests evaluate against a baseline that the creating organization sets, often by documenting design decisions and conventions used throughout the content. Convention tests include both objective and conditional tests but the criteria used to evaluate the results differ based on the aggregate being tested. Results within the same Examples include testing for consistency of design decisions and content choices."
> 
> As I read this, my understanding is that individual entities can define their own conventions (based upon what criteria?). It is unclear how this will advance accessibility however, because if we do not have a say in the "convention(s)", entities could create a convention that frustrates rather than improves the user-experience for some PwD. "It is the convention of the XYZ Widget Company that all footer text be rendered at 7pts." helps no-one, yet according to the current draft would be completely acceptable.
> 
> I strongly object to entities writing their own conformance rules - there is zero evidence that this will work in practice, and one of our own design principles <https://www.w3.org/TR/wcag-3.0-explainer/#ContentGoals> is that decisions would be "data-informed and evidence-based". Where is the data and evidence driving this decision (please)?
> 
> Procedure tests: Tests that evaluate whether an accepted process was used to improve accessibility. Examples include usability and plain language testing.
> I find this entry particularly confusing, as we currently have a sub-team (Protocols) which is looking at "non-testable" requirements that would be candidates for Protocols, and by direction from the chairs, we are using Plain Language as one of our examples. 
> 
> This is another example of an aspirational statement/goal with no evidence that it can work in practice, and the fact that our group is using Plain Language as both a requirement that cannot resolve to Pass/Fail (in the context of the Protocols discussions) and then again here as a requirement that can meet Pass or Fail is contradictory on both the surface and in substance. 
> 
> From a subjective perspective, if I say I applied the principles of Plain language on my site content, how can a 3rd party entity confirm or refute that claim? If I claim that I've "Kept It Conversational  / Use(d) examples <https://www.plainlanguage.gov/guidelines/conversational/use-examples/>" how can anyone argue against that or prove me wrong? (On a pragmatic level, how many examples are required before it is rendered True versus False? 1? 5 or more? Something else again? Is there such a thing as too many examples <https://www.plainlanguage.gov/guidelines/concise/write-short-sections/> - and who determines what "concise" means in context? Do "examples" need to be present on each View evaluated, or is it sufficient to have one page with "examples"? Why or why not?)
> 
> Additionally, "usability" as a Pass/Fail? How will this be determined, and by whom? "Usable" to whom? 
> 
> There are simply far too many unanswered questions here that making this statement, even in our working draft, is too soon, and unevaluated for use.
> 
> ******************************
> The draft also continues:
> 
> Tests can be applied to four different scopes:
> Item: A component or unit of content. Examples include a drop down menu, a media player, a phrase, or an image.
> Views: All content visually and programatically available without a substantive change. 
> User Process: Series of user actions, and the distinct interactive views that support the actions, where each action is required in order to complete an activity.
> Aggregate: Combination of all related items, user processes, and views.
> I have concerns with this as well, as all of these types of tests have value, but should they all contribute to a final "score"? Who determines which of the 4 different scopes are applicable for a screen or content? How and why (and when)? How is that reported for 3rd-party verification? In practice is it one of the 4 options above, or is it realistically always "Aggregate"? (Why or why not?)
> 
> In particular, I struggle with "User Process" because while entities *MAY* be able to define a "happy path" for specific functions or features, actual user testing confirms that not all users will take the "happy path" envisioned by the UX designer, and on many screens, the user frequently has multiple "happy paths" to choose from - so which are in scope and not in scope? (why or why not?) What, for example, is the user process or happy path for this screen/view: https://www.w3.org/WAI/ <https://www.w3.org/WAI/>? (This is a serious question)
> 
> If a 'happy path' also provides contextual help along the way (contextual, in that it is 'optional' - i.e. the kind of content we frequently see in modals or hidden behind 'help' icons such as the italicised i in a circle) - is each of those 'forks' in the happy path (some users will read one or more of those contextual helps, another user will ignore them all = 2 or more paths), does each variant of the path need to be evaluated? Why or why not? Who determines that? How are 'happy path' (user-flows) documented to support repeatability in testing? What kind of Pass/Fail hardware testing configurations are required on these 'user processes'? All AT configurations? Some? (Which and why/how? Who determines? How is that documented?) What about when AT is not involved (i.e. cognitive issues)? This is very long on aspiration, and very short on details.
> 
> Is the expectation here similar to Convention Tests, where the entity is writing their own tests against their own defined paths? If so, where is this documented? What assurances do we have that "user process" tests are complete and inclusive if the entity is writing their own tests?
> 
> I do note that for Procedural Tests, the proposed draft also states "The requirements for what would be evaluated for procedure tests are still to be determined."  However, until we have an example of a workable Procedural test to examine, I will suggest that adding this to our draft is not in keeping with "...decisions (being) "data-informed and evidence-based"."
> 
> For these reasons I strongly object to these edits. I recognize that the current draft is no better off, but perpetuating problematic content does not serve anyone well, and these edits do not address many of the underlying concerns that have been identified previously via industry feedback (and in some instances continue to perpetuate the same issues):
> https://github.com/w3c/silver/labels/section:%20conformance <https://github.com/w3c/silver/labels/section:%20conformance>
> https://github.com/w3c/silver/labels/section:%20scoring <https://github.com/w3c/silver/labels/section:%20scoring>
> 
> Respectfully,
> 
> JF
> --
> John Foliot | 
> Senior Industry Specialist, Digital Accessibility | 
> W3C Accessibility Standards Contributor |
> 
> "I made this so long because I did not have time to make it shorter." - Pascal
> "links go places, buttons do things"
Received on Saturday, 23 April 2022 17:59:07 UTC