Re: Framework Doc - Accuracy Benchmarking from Detlev Fischer on 2016-11-21 (public-wcag-act@w3.org from November 2016)

From: Detlev Fischer <detlev.fischer@testkreis.de>
Date: Mon, 21 Nov 2016 13:13:22 +0100 (CET)
To: alistair.garrison@ssbbartgroup.com, wilco.fiers@deque.com
Cc: public-wcag-act@w3.org
Message-Id: <20161121121322.E03296821CDA@dd24924.kasserver.com>
As I mentioned on an earlier ACT call, DIAS will start the two year COMPARE Erasmus+ project (DIAS, DE; Funka, SE; Braillenet, FR --  full title is "COMparing Peer Accessibility Ratings in Evaluation") January next year. 

The aim of COMPARE is to define real life test cases around particular, more complex types of widget (say tab panels, combo boxes or pseudo dialogues) or types of situation (e.g., error handlinng, confirmation messages after submitting) and invite experts to rate the conformance of the cases - i.e., if they think it fails, list the SC that failed and comment why. Or list why an issue / defect can be tolerated and the cases still passes. So this will produce some manual test data that you are after.

The input should of course not only come from the three partner orgs, but from other experts active in a11y testing. Every practitioner will be invited to contribute their rating of the cases listed (including comments, e.g. tests from WCAG techniques or failure techniques they compared the implementation against) or submit new cases. 

Comparing individual ratings of the same case should allow us to get an idea of the spread and the degree of inter-tester reliability.  The aim is also to conmpare, especially across Europe, different approaches in operationalizing the SCs in the form of concrete test steps. As you can imagine, there are many cases where testing approaches differ. 

Just as an example, I have heard that to pass 2.4.1 in France's AccessiWeb scheme, you'd have to include skiplinks AND headings AND landmarks whereas if I read WCAG documents, it would seem that any of these alone would suffice.

Another example around 2.4.1: if you DO use skiplinks and landmarks but one of your skiplinks doesn't work, would you fail 2.4.1? (Probably not, IMHO - but exactly that we would like to establish, in a bottom-up perspective from all the experts that care to contribute to the repository).

I am participating in ACT but currently can't make the telco due to an important privarte engagement on Wednesday - I am looking forward to assume a more active role next year and look forward to interfacing our results with the ACT work.

We hope to have the website up an running soon - if interested, drop me a line off-list.

Best,
Detlev


--
Detlev Fischer
testkreis c/o feld.wald.wiese
Thedestr. 2, 22767 Hamburg

Mobil +49 (0)157 57 57 57 45
Fax +49 (0)40 439 10 68-5

http://www.testkreis.de
Beratung, Tests und Schulungen für barrierefreie Websites

Wilco Fiers schrieb am 20.11.2016 13:30:

> Hi Alistair,
> 
> 
> I don't disagree that the benchmark will be complicated. I don't think it can be done without comparing to manually tested pages. Which will require us to get access to manually test data. Gathering all that data ourselves would be pretty impractical, But there are a lot of organisations that have this data. So I'd say that our best bet would be to see who is able to share their accessibility data, so that we can build off of that.
> 
> 
> As for your suggestion to have test suites. I expect this will be part of the requirement for a rule. I want to clearly distinguish between testing the rule itself, which is what the benchmark is for, and testing the implementation of the rule, which is what a test suite would be for.
> 
> 
> Thoughts?
> 
> 
> Wilco
> 
> 
> On Wed, Nov 16, 2016 at 10:43 AM, Alistair Garrison <alistair.garrison@ssbbartgroup.com <mailto:alistair.garrison@ssbbartgroup.com> > wrote:
>> 
>> 
>> Dear All,
>> 
>>  
>> 
>> After reading through the Framework Document - https://w3c.github.io/wcag- <https://w3c.github.io/wcag-act/act-framework.html#quality-updates> act/act-framework.html#quality-updates, I would say that the Accuracy Benchmarking concept might be tricky as it is described.
>> 
>>  
>> 
>> The section reads – “Measuring this accuracy, not just on test data, but on pages and components that the rules would actually be applied to, is important to give users of the test results confidence in the accuracy of this data.”
>> 
>>  
>> 
>> The question is – how do you tell if the test is working properly in a live page, without first looking at the relevant code in the page (and assessing it – possibly manually, or with another tool)?
>> 
>>  
>> 
>> I would propose we concentrate on creating test suites for each of the tests – with an ever growing number of clearly specified real-world edge cases that we are informed about by the testing community.
>> 
>>  
>> 
>> Very best regards
>> 
>>  
>> 
>> Alistair
>> 
>>  
>> 
>> ---
>> 
>>  
>> 
>> Alistair Garrison
>> 
>> Senior Accessibility Engineer
>> 
>> SSB Bart Group
>> 
>>  
>> 
>>  
>> 
>>  
>> 
>>  
>> 
>>  
>> 
>> 
> --
> 
> 
> Wilco Fiers - Senior Accessibility Engineer
> 
>
Received on Monday, 21 November 2016 12:13:52 UTC