RE: Automated and manual testing process from Jonathan Avila on 2017-01-29 (w3c-wai-gl@w3.org from January to March 2017)

From: Jonathan Avila <jon.avila@ssbbartgroup.com>
Date: Sun, 29 Jan 2017 23:47:59 +0000
To: GLWAI Guidelines WG org <w3c-wai-gl@w3.org>
Message-ID: <DM5PR03MB2780C9FB5D316BD996210AC49B480@DM5PR03MB2780.namprd03.prod.outlook.com>
I agree with what’s been said so far about the different testing approaches.  I would agree that if experts can’t agree on a SC then the SC and understanding documents weren’t written clear enough.  In regards to particular violations – that is an issue where even as experts the opinion of the violation can vary widely.  There are a number of things that can help people make a decision about whether something is a failure including but not limited to:

·         Failure techniques

·         Sufficient techniques

·         Advisory techniques (if something is listed as advisory then you may know it wasn’t required unless the SC wasn’t met another way and the advisory technique covers the entirety of the SC).

·         Group reviews (as discussed by Detlev with the Compare project).

·         Understanding documents that describe the intent of the WG.

I wasn’t directly involved in creating the DHS Trusted Tester program in the US but I have heard that several agencies participated and worked to agree on a set of known failures and known passes for the requirements.  From what I understand prior to the program some of the participating agencies also held review sessions after manual testing where peers could discuss issues in a group to make sure they had appropriately flagged or not flagged something that they had tested.  Whether these sessions are held formally or informally I’d imagine that most accessibility solutions organizations have discussions around issues on a daily basis even when experts are involved.  Given that SC are written to be technology neutral and not mandate specific techniques be used and since the user agent support varies widely I don’t see how in the foreseeable future we can eliminate these discussions for some issues.   A repository for the issues and outcomes is certainly one way to be more efficient and add consistency to the process.

Jonathan


Jonathan Avila
Chief Accessibility Officer
SSB BART Group
jon.avila@ssbbartgroup.com<mailto:jon.avila@ssbbartgroup.com>
703.637.8957 (Office)

Visit us online: Website<http://www.ssbbartgroup.com/> | Twitter<https://twitter.com/SSBBARTGroup> | Facebook<https://www.facebook.com/ssbbartgroup> | LinkedIn<https://www.linkedin.com/company/355266?trk=tyah> | Blog<http://www.ssbbartgroup.com/blog/>
See you at CSUN in March!<http://info.ssbbartgroup.com/CSUN-2017_Sessions.html>

The information contained in this transmission may be attorney privileged and/or confidential information intended for the use of the individual or entity named above. If the reader of this message is not the intended recipient, you are hereby notified that any use, dissemination, distribution or copying of this communication is strictly prohibited.

From: Detlev Fischer [mailto:detlev.fischer@testkreis.de]
Sent: Sunday, January 29, 2017 9:41 AM
To: Shilpi Kapoor
Cc: Gregg C Vanderheiden; Andrew Kirkpatrick; GLWAI Guidelines WG org
Subject: Re: Automated and manual testing process

It's *very* common to have edge cases in manual testing where there are issues but it is uncertain whether they are serious enough to fail a page. The problem for the tester is that a group of co-testers is not available, so he or she has to come down on either side of pass/fail which means that he/she has to decide whether the issue can be 'tolerated' or not. Since testers have to decide individually on the spot, often with no recourse to established cases, the idea of replicability in manual testing is IMO basically wishful thinking. Even if we trust we will reach 80% in any one SC, when looking at tester ratings across *all* SCs this optimism is no longer justified.
Just for reference: We have argued in the past that a graded rating scheme is better suited if the goal is to reflect the actual impact of issues on accessibility:
https://www.w3.org/WAI/RD/2011/metrics/paper7/


We have just started a project to compare actual expert ratings of real web content. The name of the project is COMPARE. Once the platform is set up we hope WCAG testers / experts will contribute their ratings (and submit cases to be rated). You can read about COMPARE and register your interest here:
http://www.funka.com/en/projekt/compare/


Detlev

Sent from phone

Am 29.01.2017 um 12:14 schrieb Shilpi Kapoor <shilpi@barrierbreak.com<mailto:shilpi@barrierbreak.com>>:
As much as we promote automated testing processes and tools, I think we cannot ignore manual testing.

Manual testing can be with expert testers and these experts might be primarily assistive technology users also. I wouldn’t want to put the emphasis on only screen reader tests.

I agree with Gregg that User testing is a whole other thing. But we need to ensure that manual testing approaches are not ignored. I yet am to find one tool that gets it all right and often it is the manual testing that normalizes the findings.

Thanks & Regards
Shilpi Kapoor | Managing Director
BarrierBreak


From: Gregg C Vanderheiden <greggvan@umd.edu<mailto:greggvan@umd.edu>>
Date: Sunday, 29 January 2017 at 10:03 AM
To: Andrew Kirkpatrick <akirkpat@adobe.com<mailto:akirkpat@adobe.com>>
Cc: GLWAI Guidelines WG org <w3c-wai-gl@w3.org<mailto:w3c-wai-gl@w3.org>>
Subject: Re: Automated and manual testing process
Resent-From: <w3c-wai-gl@w3.org<mailto:w3c-wai-gl@w3.org>>
Resent-Date: Sun, 29 Jan 2017 04:34:15 +0000


I will speak from where we were in WCAG 2.0

Manual testing — is testing by people who know the technology and the guidelines.   Expert testers.   It is not user testing.       In order to be “testable” or “objective”   (our criteria for making it into WCAG 2 ) it had to be something that most knowledgable testers skilled in the art would agree on the outcome.  80% or more would all agree on outcome.   We strove for 95% or greater - but allowed for …  well .. sticklers.


User testing is a whole other thing — and although we GREATLY encourage user testing of any website— we did not require it for conformance.


In WCAG 2.0   we required Alt text — but did not require that it be GOOD alt text because we found quickly that there was no definition of good alt text where we could get 80% or better consistent judgement with ALL alt text samples.     Easy for very good and very bad.   But when you get in the middle — it got in a muddle.     it was easy to find samples where we didnt get 80%  so - failed our test that  WORST CASE was only 80% agreed.



Gregg





Gregg C Vanderheiden
greggvan@umd.edu<mailto:greggvan@umd.edu>



On Jan 28, 2017, at 5:36 PM, Andrew Kirkpatrick <akirkpat@adobe.com<mailto:akirkpat@adobe.com>> wrote:

AGWGer’s,
I’d like to get the thoughts from the group on what constitutes “manual testing” (I’m more comfortable with what counts as automated testing).

Testing the presence of alternative text on an image in HTML or other formats can be done with automated testing, but testing for the presence of good alternative text requires (at least for now) human involvement in the test process (manual testing).

What if testing cannot be done by a single person and requires user testing – does that count as manual testing, or is that something different?

Thanks,
AWK

Andrew Kirkpatrick
Group Product Manager, Standards and Accessibility
Adobe

akirkpat@adobe.com<mailto:akirkpat@adobe.com>
http://twitter.com/awkawk
Received on Sunday, 29 January 2017 23:48:37 UTC