Re: How many WCAG 2.1 SCs are testable with automated tests only? from Andrew Somers on 2019-08-21 (w3c-wai-gl@w3.org from July to September 2019)

From: Andrew Somers <me@AndySomers.com>
Date: Tue, 20 Aug 2019 19:35:57 -0700
To: Shawn Lauriat <lauriat@google.com>
Cc: "Patrick H. Lauke" <redux@splintered.co.uk>, WCAG <w3c-wai-gl@w3.org>
Message-Id: <ACC1989A-729F-4029-9735-6FEBA837C3C7@AndySomers.com>
> On Aug 20, 2019, at 10:32 AM, Shawn Lauriat <lauriat@google.com> wrote:
> 
> 1.4.3 Text Contrast (with the exception of text as image and edge cases - absolutely positioned elements?)
> 
> Yeah, any nesting with transparent/translucent backgrounds make this difficult to definitively say whether a give page passes, as well as the difference between CSS color and rendered color for cases like thin fonts with anti-aliasing. I'd say in some cases automation could definitely say, but I wouldn't describe the SC itself as fully automatable today. I think with some work, we could get tooling to the point that we can fully automate it, though.

Hi Shawn, some thoughts:

Thin Fonts, Spatial Frequency

These are some of the issues we are addressing/discussing/researching for Silver. And I am presently working on a potential new SC for WCAG2.2 that will clarify the issues posed by body-text and the often thin/small fonts where contrast becomes assimilated by the bg due to antialiasing.

On the subject of transparency: 

Render/layer order he colors that result from the transparency need to be calculated first, in sRGB colorspace, and then those resultant values are the ones that need to be linearized and weighted for luminance Y for contrast testing. So in the following example, the brown BG is #B46432 (180,100,50), all the blue DIVs are #1478F0 (20,120,240), with varying transparencies of #40, #80, #C0, #FF (75%, 50%, 25%, 0%). The text on all the blue divs is #96FA00 (150,250,0) at 50% transparent except the last one, for reference. This screenshot was done in sRGB colorspace.




So to determine the contrast between the text “Transparent 50%” and its BG, first the transparency between the blue div and the brown BG has to be calculated, creating the renderedDIVcolor of (100,110,145) #646E91, then that color is used to determine the color of the transparent font on top, which ends up as (125,180,72) #7DB448.

So… I think an appropriately written tool could determine this easy case “automatically”… but wait, what about the word “transparency" that’s UNDER the div… it’s now not black but (10,60,120) #0A3C78 … 

When it was black against that brown it was 4.8:1, but the 50% blue div dragged that down to 2.16:1 (using WCAG math)

BUT the 50% green text is 4.4:1 against the near-black text, but a human would not compare those two, though an automatic pixel sampling might...

The level of complexity could make this or more complicated pages hard to assess simply by looking at the DIVs and CSS.

For instance, these buttons are all full of gradients and transparencies, how does an automated process judge a gradient?:



A possible way is rendering the page to a PNG, and then using an ICAM method to analyze a page as a rendered unit might be a better choice for more automatic, programatic assessment of the visual characteristics.

As an idea, combining an ICAM for assessment of contrasts and spatial frequency, coupled with common OCR technology for word recognition to direct where exact assessments need to be sampled, could make at least the visual aspects an automated process.

Andy

> 3.1.1 Language (provided that the main language of the page can be inferred)
> 
> For those owning the given sites, they can probably automate this based on the content they know will appear on the site, but it would still require that human judgment for that parameter. Purposefully not including things like user-generated content here, as I think that falls more under 3.1.2 Language of parts, since the overall site likely still has an automatable page-level expectation.
> 
> Honestly, I think all other SCs would fall into group B, as content authors themselves can always write rules for publishing patterns that they establish. We could write generally available rules checking for common patterns and anti-patterns, it just comes down to cost-benefit balancing of whether it makes sense to spend a huge amount of time trying to automate something that a human will spot every time anyway as a part of the needed manual testing. Even with things that seem fairly human-judgment-necessary, sometimes basic heuristics can go a long way in reducing that load on the manual tester.
> 
> -Shawn
> 
> P.S. Thank you for that write-up! Incredibly helpful, especially with regard to the current work in Silver.
> P.P.S. Your example the video with no captions of a false positive isn't a false positive: it should have a caption file that just says "[soft background music]" for the duration - otherwise, those users without hearing don't know that they haven't missed anything. The example of the inactive button similarly sounds like a test that needs to take control state into account.
> 
> On Tue, Aug 20, 2019 at 9:56 AM Patrick H. Lauke <redux@splintered.co.uk <mailto:redux@splintered.co.uk>> wrote:
> I was pondering something along the same lines not so long ago. I'd say 
> that for Group B, there are at least some cases where automated tools 
> can (and currently do) check for common patterns in markup that are 
> almost always guaranteed to be failures - depending on how 
> thorough/complex the test is, you could for instance say that an <img> 
> without any alt="" or alt="...", that is not hidden via display:none or 
> aria-hidden="true" on it or any of its ancestors, and doesn't have an 
> aria-label, nor something like role="presentation", is most likely to be 
> a failure of 1.1.1 either because it's decorative but not suppressed, or 
> contentful but lacking alternative, or if the alternative is there in 
> some other form like a visually-hidden span then the <img> itself should 
> be hidden, etc.
> 
> But overall agree that for a really solid pass/fail assessment, most of 
> these definitely need an extra human to at least give a once-over to 
> either verify automatically-detected problems that "smell" like 
> failures, and to also look for things that a tool wouldn't be able to 
> check such as very odd/obtuse markup/CSS/aria constructs.
> 
> P
> 
> > 1.1.1 Non-text Content (needs check if alternative text is meaningful)
> > 1.2.2 Captions (needs check that captions are indeed needed, and that 
> > they are not "craptions")
> > 1.3.1 Info and Relationships (headings hierarchy, correct id references 
> > etc - other aspects not covered)
> > 1.3.5 Identify Input Purpose (needs human check that input is about the 
> > user)
> > 1.4.2 Audio Control (not sure from looking at ACT rules if this can work 
> > fully automatically)
> > 1.4.11 Non-Test Contrast (only for elements with CSS-applied colors)
> > 2.1.4 Character Key Shortcuts (currently via bookmarklet)
> > 2.2.1 Timing adjustable (covers meta refresh but not time-outs without 
> > warning)
> > 2.4.2 Page Titled (needs check if title is meaningful)
> > 2.4.3 Focus order (may discover focus stops in hidden content? but 
> > probably needs add. check)
> > 2.4.4 Link purpose (can detect duplicate link names, needs add. check if 
> > link name meaningful)
> > 3.1.2 Language of parts (may detect words in other languages, probably 
> > not exhausive)
> > 2.5.3 Label in name (works only if labels that can be programmatically 
> > determined)
> > 2.5.4 Motion Actuation (may detect motion actuation events but would 
> > need verification if alternatives exist)
> > 3.3.2 Labels or Instrcutions (can detect inputs without linked labels 
> > but not if labels are meaningful)
> > 4.1.2 Name, Role, Value (detects inconsistencies such as parent/child 
> > errors but not probably not cases where rules / attributes should be 
> > used but are missing?)
> > 
> > I am investigating this in the context of determining to what extent the 
> > "simplified monitoring" method of the EU Web Directive can rely on 
> > fully-automated tests for validly demonstrating non-conformance - see 
> > the corresponding article 
> > https://team-usability.de/en/teamu-blog-post/simplified-monitoring.html <https://team-usability.de/en/teamu-blog-post/simplified-monitoring.html>
> > 
> > Are there any fully-automated tests beyond 1.4.3, 3.1.1 and 4.1.1 that I 
> > have missed?
> > 
> > Best,
> > Detlev
> > 
> 
> 
> -- 
> Patrick H. Lauke
> 
> www.splintered.co.uk <http://www.splintered.co.uk/> | https://github.com/patrickhlauke <https://github.com/patrickhlauke>
> http://flickr.com/photos/redux/ <http://flickr.com/photos/redux/> | http://redux.deviantart.com <http://redux.deviantart.com/>
> twitter: @patrick_h_lauke | skype: patrick_h_lauke
>
Attachments

text/html attachment: stored
image/png attachment: Screen_Shot_2019-08-20_at_6.41.27_PM.png
image/png attachment: Screen_Shot_2019-08-20_at_7.29.44_PM.png
Received on Wednesday, 21 August 2019 02:36:32 UTC