W3C home > Mailing lists > Public > public-wai-evaltf@w3.org > May 2012

AW: AW: AW: evaluating web applications (was Re: Canadian Treasury Board accessibility assessment methodology)

From: Kerstin Probiesch <k.probiesch@googlemail.com>
Date: Thu, 24 May 2012 08:28:22 +0200
To: "'Michael S Elledge'" <elledge@msu.edu>, <public-wai-evaltf@w3.org>
Message-ID: <4fbdd4cc.210eb40a.4ddd.ffffab15@mx.google.com>
Hi Michael, all,

concerning different approaches: I like the idea of different approaches and it would of course be a benefit. In the same time I think that is not feasible. The main problem I think is not to develop proper criteria for evaluation methodologies like size of the sample for a tested website and minimum size of tested websites. But we would need also a required reliability coefficient based upon the amount of X tested websites with the same approach, amount of X tested websites with independent testers (not testers who belong to the same organization and what will we do with the freelancers?), mechanisms for reviewing the validity of the tests (a test can be reliable and in the same time not valide) and last not least verifiable measures to ensure objectivity and so on. The result could be something like "WCAG-EM certified". The question would be: Who controls the evaluators? And I think this will lead to a market distortion, because of the costs of the certification process. Some organizations might have the money for this and some not.

I agree in full that the most important outcome is not a score but to find accessibility issues.

Score can be perceived from minimum two sides. A score can be very helpful for example a study upon the best city websites in country X and results in a ranking. One can show which city sites are the best (ok. There are also some methodological issues, but they are resolvable) and find typical problems which can lead to special trainings or even to a ranking of the best CMS for accessibility. One issue will arise here also: how to ensure that the owners of the best websites don't make their own marketing with "we are compliant with WCAG2" - but this is an communication issue.

As long as evaluations accompanying the development of websites and applications scores (depending on the communication what a certain scores means) can of course be helpful as indicator for the success on the way to AA or even AAA but the result of the final review as requirement for WCAG 2 conformance has to be 100%. As - I believe - Vivienne pointed out: not a "little pregnant", but something like "prenatal care and support". ;-)

Therefore I like the idea of percent like in the Canadian? approach Shadi posted yesterday as long as it is not a final report for a seal "WCAG2 Conformance) but of course for accompanying the development.



> -----Ursprüngliche Nachricht-----
> Von: Michael S Elledge [mailto:elledge@msu.edu]
> Gesendet: Mittwoch, 23. Mai 2012 16:38
> An: public-wai-evaltf@w3.org
> Betreff: Re: AW: AW: evaluating web applications (was Re: Canadian
> Treasury Board accessibility assessment methodology)
> Hi Everyone--
> A couple of thoughts with respect to rating accessibility.
> I think one of the problems we're having is creating objective measures
> for an occasionally subjective evaluation. I use the term
> "occasionally"
> because some criteria can deliver binary (yes/no, true/false) results.
> That said, it seems to me that adding additional levels of subjectivity
> to such an evaluation (example: a Likert scale) only compounds the
> problem. As much as I find a 100 point scale appealing, especially
> weighting items according to their significance, that system will, I
> believe, be even more subject to varying interpretation.
> On the other hand, I'm not sure that the role of the EVALTF is to
> decide
> which particular approach(-es) should be used, but instead to ensure
> that whichever approach is used meet basic evaluation criteria (such as
> replicability and transparency). This will encourage people to develop
> differing, but equally valid approaches to measuring accessibility
> compliance, which is ultimately a benefit to everyone.
> The most important outcome of an evaluation, it seems to me, is not to
> create a score per se', but to identify where there are accessibility
> issues so they can be repaired during development or, after release,
> taken into account by users, particularly persons with disabilities.
> Thoughts?
> Mike
> On 5/23/2012 5:09 AM, Aurélien Levy wrote:
> > Hi,
> >
> > there is also another thing to consider. Maybe we will achieve
> someday
> > to have a perfect way to measure accessibility but at the end if the
> > time needed to get this score is three, five, ten times longer than
> > the basic conformance metrics, I'm not sure it's really useful.
> > Yes you will get a more precise score regarding the "real"
> > accessibility of your website and then so what ? The time needed to
> > improve it is still the same regardless of the quality of your
> metrics.
> > Most of people already see accessibility as a cost, I prefer they
> > spent there money/time on improving there website than in making in
> > depth audit just to get the most accurate metrics.
> >
> > What we really need is :
> > - mutual methodology
> > - mutual cost efficient metric
> > - mutual testcases
> > - mutual tests
> >
> > With all that we can start making comparison between tools/expert/etc
> > to improve ourself
> >
> > Regards,
> >
> > Aurélien Levy
> > ----
> > Temesis CEO
> >> Hi Kerstin,
> >>
> >> As expressed in the paper, the statistics function has only recently
> >> been added. So at the moment, this is an informal assessment which
> we
> >> will need to back up once we have more data.
> >>
> >> But this is what we hope to get out of the stats function:
> >>
> >> 1. Tester reliability over time: How much are individual evaluators
> >> 'off the mark' compared to the final quality-assured result? This
> >> could show an improvement over time, an interesting metrics to
> assess
> >> the level of qualification especially of new and less experienced
> >> evaluators.
> >>
> >> 2. Inter-evaluator reliability: How close are the results of
> >> different evaluators assessing the same site / page sample?
> >>
> >> There is likely to be little on test-retest reliability data since
> >> usually, the sites tested are a moving target - improved based on
> >> test results. Only rarely the same site is re-tested in a tandem
> test
> >> - this usually only happens after a re-launch.
> >>
> >> A fundamental problem in all those statistics is that there is no
> >> objective benchmark to compare individual rating results against -
> >> just the arbitrated and quality assured final evaluation result.
> >> Given the scope of interpretation in accessibility evaluation, we
> >> think this lack of objectivity is inevitable and in the end, down to
> >> the complexity of the field under investigation and the degree of
> >> human error in all evaluation.
> >>
> >>
> >> --
> >> Detlev Fischer
> >> testkreis c/o feld.wald.wiese
> >> Borselstraße 3-7 (im Hof), 22765 Hamburg
> >>
> >> Mobil +49 (0)1577 170 73 84
> >> Tel +49 (0)40 439 10 68-3
> >> Fax +49 (0)40 439 10 68-5
> >>
> >> http://www.testkreis.de
> >> Beratung, Tests und Schulungen für barrierefreie Websites
> >>
> >
> >
> >
> --
> Michael S. Elledge
> Associate Director
> Usability/Accessibility Research and Consulting
> Michigan State University
> Kellogg Center
> 219 S. Harrison Rd Room 93
> East Lansing, MI  48824
> 517-353-8977
Received on Thursday, 24 May 2012 06:35:49 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 20:40:21 UTC