Re: How many WCAG 2.1 SCs are testable with automated tests only? from Shawn Lauriat on 2019-08-20 (w3c-wai-gl@w3.org from July to September 2019)

From: Shawn Lauriat <lauriat@google.com>
Date: Tue, 20 Aug 2019 13:32:47 -0400
To: "Patrick H. Lauke" <redux@splintered.co.uk>
Cc: WCAG <w3c-wai-gl@w3.org>
Message-ID: <CAGQw2hmyGFbke8d5VmEUgy=O_F7cqF+F3V+oZWhj8WiP43Z6DQ@mail.gmail.com>
>
> 1.4.3 Text Contrast (with the exception of text as image and edge cases -
> absolutely positioned elements?)


Yeah, any nesting with transparent/translucent backgrounds make this
difficult to definitively say whether a give page passes, as well as the
difference between CSS color and rendered color for cases like thin fonts
with anti-aliasing. I'd say in some cases automation could definitely say,
but I wouldn't describe the SC itself as fully automatable today. I think
with some work, we could get tooling to the point that we can fully
automate it, though.

3.1.1 Language (provided that the main language of the page can be inferred)


For those owning the given sites, they can probably automate this based on
the content they know will appear on the site, but it would still require
that human judgment for that parameter. Purposefully not including things
like user-generated content here, as I think that falls more under 3.1.2
Language of parts, since the overall site likely still has an automatable
page-level expectation.

Honestly, I think all other SCs would fall into group B, as content authors
themselves can always write rules for publishing patterns that they
establish. We could write generally available rules checking for common
patterns and anti-patterns, it just comes down to cost-benefit balancing of
whether it makes sense to spend a huge amount of time trying to automate
something that a human will spot every time anyway as a part of the needed
manual testing. Even with things that seem fairly human-judgment-necessary,
sometimes basic heuristics can go a long way in reducing that load on the
manual tester.

-Shawn

P.S. Thank you for that write-up! Incredibly helpful, especially with
regard to the current work in Silver.
P.P.S. Your example the video with no captions of a false positive isn't a
false positive: it should have a caption file that just says "[soft
background music]" for the duration - otherwise, those users without
hearing don't know that they haven't missed anything. The example of the
inactive button similarly sounds like a test that needs to take control
state into account.

On Tue, Aug 20, 2019 at 9:56 AM Patrick H. Lauke <redux@splintered.co.uk>
wrote:

> I was pondering something along the same lines not so long ago. I'd say
> that for Group B, there are at least some cases where automated tools
> can (and currently do) check for common patterns in markup that are
> almost always guaranteed to be failures - depending on how
> thorough/complex the test is, you could for instance say that an <img>
> without any alt="" or alt="...", that is not hidden via display:none or
> aria-hidden="true" on it or any of its ancestors, and doesn't have an
> aria-label, nor something like role="presentation", is most likely to be
> a failure of 1.1.1 either because it's decorative but not suppressed, or
> contentful but lacking alternative, or if the alternative is there in
> some other form like a visually-hidden span then the <img> itself should
> be hidden, etc.
>
> But overall agree that for a really solid pass/fail assessment, most of
> these definitely need an extra human to at least give a once-over to
> either verify automatically-detected problems that "smell" like
> failures, and to also look for things that a tool wouldn't be able to
> check such as very odd/obtuse markup/CSS/aria constructs.
>
> P
>
> > 1.1.1 Non-text Content (needs check if alternative text is meaningful)
> > 1.2.2 Captions (needs check that captions are indeed needed, and that
> > they are not "craptions")
> > 1.3.1 Info and Relationships (headings hierarchy, correct id references
> > etc - other aspects not covered)
> > 1.3.5 Identify Input Purpose (needs human check that input is about the
> > user)
> > 1.4.2 Audio Control (not sure from looking at ACT rules if this can work
> > fully automatically)
> > 1.4.11 Non-Test Contrast (only for elements with CSS-applied colors)
> > 2.1.4 Character Key Shortcuts (currently via bookmarklet)
> > 2.2.1 Timing adjustable (covers meta refresh but not time-outs without
> > warning)
> > 2.4.2 Page Titled (needs check if title is meaningful)
> > 2.4.3 Focus order (may discover focus stops in hidden content? but
> > probably needs add. check)
> > 2.4.4 Link purpose (can detect duplicate link names, needs add. check if
> > link name meaningful)
> > 3.1.2 Language of parts (may detect words in other languages, probably
> > not exhausive)
> > 2.5.3 Label in name (works only if labels that can be programmatically
> > determined)
> > 2.5.4 Motion Actuation (may detect motion actuation events but would
> > need verification if alternatives exist)
> > 3.3.2 Labels or Instrcutions (can detect inputs without linked labels
> > but not if labels are meaningful)
> > 4.1.2 Name, Role, Value (detects inconsistencies such as parent/child
> > errors but not probably not cases where rules / attributes should be
> > used but are missing?)
> >
> > I am investigating this in the context of determining to what extent the
> > "simplified monitoring" method of the EU Web Directive can rely on
> > fully-automated tests for validly demonstrating non-conformance - see
> > the corresponding article
> > https://team-usability.de/en/teamu-blog-post/simplified-monitoring.html
> >
> > Are there any fully-automated tests beyond 1.4.3, 3.1.1 and 4.1.1 that I
> > have missed?
> >
> > Best,
> > Detlev
> >
>
>
> --
> Patrick H. Lauke
>
> www.splintered.co.uk | https://github.com/patrickhlauke
> http://flickr.com/photos/redux/ | http://redux.deviantart.com
> twitter: @patrick_h_lauke | skype: patrick_h_lauke
>
>
Received on Tuesday, 20 August 2019 17:33:24 UTC