Re: Framework Doc - Accuracy Benchmarking from Wilco Fiers on 2016-11-21 (public-wcag-act@w3.org from November 2016)

From: Wilco Fiers <wilco.fiers@deque.com>
Date: Mon, 21 Nov 2016 14:26:10 +0100
To: Detlev Fischer <detlev.fischer@testkreis.de>
Cc: Alistair Garrison <alistair.garrison@ssbbartgroup.com>, Accessibility Conformance Testing <public-wcag-act@w3.org>
Message-ID: <CAHVyjGOuGh5DAmU0V_Xm7mU0QQLKd3OMLzxEbF0uFyF8bu=SYA@mail.gmail.com>
Hi Detlev,
Thank you for sharing this with us. Are there any resources available
online where we might be able to learn more about this project?

Regards,
Wilco

On Mon, Nov 21, 2016 at 1:13 PM, Detlev Fischer <detlev.fischer@testkreis.de
> wrote:

> As I mentioned on an earlier ACT call, DIAS will start the two year
> COMPARE Erasmus+ project (DIAS, DE; Funka, SE; Braillenet, FR --  full
> title is "COMparing Peer Accessibility Ratings in Evaluation") January next
> year.
>
> The aim of COMPARE is to define real life test cases around particular,
> more complex types of widget (say tab panels, combo boxes or pseudo
> dialogues) or types of situation (e.g., error handlinng, confirmation
> messages after submitting) and invite experts to rate the conformance of
> the cases - i.e., if they think it fails, list the SC that failed and
> comment why. Or list why an issue / defect can be tolerated and the cases
> still passes. So this will produce some manual test data that you are after.
>
> The input should of course not only come from the three partner orgs, but
> from other experts active in a11y testing. Every practitioner will be
> invited to contribute their rating of the cases listed (including comments,
> e.g. tests from WCAG techniques or failure techniques they compared the
> implementation against) or submit new cases.
>
> Comparing individual ratings of the same case should allow us to get an
> idea of the spread and the degree of inter-tester reliability.  The aim is
> also to conmpare, especially across Europe, different approaches in
> operationalizing the SCs in the form of concrete test steps. As you can
> imagine, there are many cases where testing approaches differ.
>
> Just as an example, I have heard that to pass 2.4.1 in France's AccessiWeb
> scheme, you'd have to include skiplinks AND headings AND landmarks whereas
> if I read WCAG documents, it would seem that any of these alone would
> suffice.
>
> Another example around 2.4.1: if you DO use skiplinks and landmarks but
> one of your skiplinks doesn't work, would you fail 2.4.1? (Probably not,
> IMHO - but exactly that we would like to establish, in a bottom-up
> perspective from all the experts that care to contribute to the repository).
>
> I am participating in ACT but currently can't make the telco due to an
> important privarte engagement on Wednesday - I am looking forward to assume
> a more active role next year and look forward to interfacing our results
> with the ACT work.
>
> We hope to have the website up an running soon - if interested, drop me a
> line off-list.
>
> Best,
> Detlev
>
>
> --
> Detlev Fischer
> testkreis c/o feld.wald.wiese
> Thedestr. 2, 22767 Hamburg
>
> Mobil +49 (0)157 57 57 57 45
> Fax +49 (0)40 439 10 68-5
>
> http://www.testkreis.de
> Beratung, Tests und Schulungen für barrierefreie Websites
>
> Wilco Fiers schrieb am 20.11.2016 13:30:
>
> > Hi Alistair,
> >
> >
> > I don't disagree that the benchmark will be complicated. I don't think
> it can be done without comparing to manually tested pages. Which will
> require us to get access to manually test data. Gathering all that data
> ourselves would be pretty impractical, But there are a lot of organisations
> that have this data. So I'd say that our best bet would be to see who is
> able to share their accessibility data, so that we can build off of that.
> >
> >
> > As for your suggestion to have test suites. I expect this will be part
> of the requirement for a rule. I want to clearly distinguish between
> testing the rule itself, which is what the benchmark is for, and testing
> the implementation of the rule, which is what a test suite would be for.
> >
> >
> > Thoughts?
> >
> >
> > Wilco
> >
> >
> > On Wed, Nov 16, 2016 at 10:43 AM, Alistair Garrison <alistair.garrison@
> ssbbartgroup.com <mailto:alistair.garrison@ssbbartgroup.com> > wrote:
> >>
> >>
> >> Dear All,
> >>
> >>
> >>
> >> After reading through the Framework Document -
> https://w3c.github.io/wcag- <https://w3c.github.io/wcag-
> act/act-framework.html#quality-updates> act/act-framework.html#quality-updates,
> I would say that the Accuracy Benchmarking concept might be tricky as it is
> described.
> >>
> >>
> >>
> >> The section reads – “Measuring this accuracy, not just on test data,
> but on pages and components that the rules would actually be applied to, is
> important to give users of the test results confidence in the accuracy of
> this data.”
> >>
> >>
> >>
> >> The question is – how do you tell if the test is working properly in a
> live page, without first looking at the relevant code in the page (and
> assessing it – possibly manually, or with another tool)?
> >>
> >>
> >>
> >> I would propose we concentrate on creating test suites for each of the
> tests – with an ever growing number of clearly specified real-world edge
> cases that we are informed about by the testing community.
> >>
> >>
> >>
> >> Very best regards
> >>
> >>
> >>
> >> Alistair
> >>
> >>
> >>
> >> ---
> >>
> >>
> >>
> >> Alistair Garrison
> >>
> >> Senior Accessibility Engineer
> >>
> >> SSB Bart Group
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> > --
> >
> >
> > Wilco Fiers - Senior Accessibility Engineer
> >
> >
>



-- 
*Wilco Fiers* - Senior Accessibility Engineer
Attachments

image/gif attachment: deque_logo_180p.gif
Received on Monday, 21 November 2016 13:26:43 UTC