- From: Kerstin Probiesch <k.probiesch@googlemail.com>
- Date: Wed, 23 May 2012 10:54:51 +0200
- To: <detlev.fischer@testkreis.de>, <peter.korn@oracle.com>, <shadi@w3.org>
- Cc: <public-wai-evaltf@w3.org>
...which means until now you can't offer a proven value for the reliability coefficient for this test. > -----Ursprüngliche Nachricht----- > Von: detlev.fischer@testkreis.de [mailto:detlev.fischer@testkreis.de] > Gesendet: Mittwoch, 23. Mai 2012 10:31 > An: k.probiesch@googlemail.com; peter.korn@oracle.com; shadi@w3.org > Cc: public-wai-evaltf@w3.org > Betreff: Re: AW: AW: evaluating web applications (was Re: Canadian > Treasury Board accessibility assessment methodology) > > Hi Kerstin, > > As expressed in the paper, the statistics function has only recently > been added. So at the moment, this is an informal assessment which we > will need to back up once we have more data. > > But this is what we hope to get out of the stats function: > > 1. Tester reliability over time: How much are individual evaluators > 'off the mark' compared to the final quality-assured result? This could > show an improvement over time, an interesting metrics to assess the > level of qualification especially of new and less experienced > evaluators. > > 2. Inter-evaluator reliability: How close are the results of different > evaluators assessing the same site / page sample? > > There is likely to be little on test-retest reliability data since > usually, the sites tested are a moving target - improved based on test > results. Only rarely the same site is re-tested in a tandem test - this > usually only happens after a re-launch. > > A fundamental problem in all those statistics is that there is no > objective benchmark to compare individual rating results against - just > the arbitrated and quality assured final evaluation result. Given the > scope of interpretation in accessibility evaluation, we think this lack > of objectivity is inevitable and in the end, down to the complexity of > the field under investigation and the degree of human error in all > evaluation. > > > -- > Detlev Fischer > testkreis c/o feld.wald.wiese > Borselstraße 3-7 (im Hof), 22765 Hamburg > > Mobil +49 (0)1577 170 73 84 > Tel +49 (0)40 439 10 68-3 > Fax +49 (0)40 439 10 68-5 > > http://www.testkreis.de > Beratung, Tests und Schulungen für barrierefreie Websites > > > > ----- Original Message ----- > From: k.probiesch@googlemail.com > To: detlev.fischer@testkreis.de, peter.korn@oracle.com, shadi@w3.org > Date: 23.05.2012 10:09:44 > Subject: AW: AW: evaluating web applications (was Re: Canadian Treasury > Board accessibility assessment methodology) > > > > Hi Detlev, > > > > in the mentioned paper for the Website Accessibility Metrics Online > Symposium is written: " Our experience shows that the 5 point graded > rating scale is quite reliable." I think it would be helpful for the > discussion to know what "quite reliable" exactly means (the value for > the reliability coefficient). > > > > Best > > > > Kerstin > > > >> -----Ursprüngliche Nachricht----- > >> Von: detlev.fischer@testkreis.de > [mailto:detlev.fischer@testkreis.de] > >> Gesendet: Mittwoch, 23. Mai 2012 09:57 > >> An: k.probiesch@googlemail.com; peter.korn@oracle.com; shadi@w3.org > >> Cc: public-wai-evaltf@w3.org > >> Betreff: Re: AW: evaluating web applications (was Re: Canadian > Treasury > >> Board accessibility assessment methodology) > >> > >> Hi all, > >> > >> Perhaps not surprisingly for those who have followed these > discussions > >> since summer last year, I disagree with Kerstin's statement "the > more > >> granualar the evaluation, the less reliable it is". > >> > >> The binary approach produces artefacts because it often forces > >> evalutors to be either too strict (failing a SC due to minor issues) > or > >> too lenient (attesting conformance in spite of such issues). > >> > >> We've tried to show the higher fidelity of a graded evaluation > approch > >> in our recent paper for the Website Accessibility Metrics Online > >> Symposium 5 December 2011: > >> > >> http://www.w3.org/WAI/RD/2011/metrics/paper7/ > >> > >> > >> > Hi Peter, Shadi, > >> > > >> > if we would work out "something that is different" from the > pass/fail > >> which > >> > obviously is not compliant with the conformance requirements it > >> wouldn't be > >> > an evaluation methodology for WCAG 2.0 anymore. Of course: part of > >> reality > >> > is imperfect software. Part of reality are also "imperfect" > >> developers and > >> > "imperfect" online editors. The question for me is: if we consider > >> these > >> > aspects why then promote for example ATAG? Another problem for me > is: > >> the > >> > more granular evaluations are the less reliable they will be. > >> > > >> > Regards > >> > > >> > Kerstin > >> > > >> > > >> > > >> > Von: Peter Korn [mailto:peter.korn@oracle.com] > >> > Gesendet: Dienstag, 22. Mai 2012 23:24 > >> > An: Shadi Abou-Zahra > >> > Cc: Eval TF > >> > Betreff: Re: evaluating web applications (was Re: Canadian > Treasury > >> Board > >> > accessibility assessment methodology) > >> > > >> > Shadi, > >> > > >> > I don't believe one can make an effective, useful, meaningful > >> conformance > >> > claim about many classes of web applications today. That class > >> includes > >> > things like web mail, and many kinds of portal applications > >> (particularly > >> > where they only employ a single URI). > >> > > >> > I do believe it will be possible to evaluate web applications for > >> > accessibility - similar to evaluating non-web applications for > >> accessibility > >> > - but I expect we will need to do something that is different from > >> the > >> > binary "perfection"/"imperfection" of the current conformance > claim > >> rubric. > >> > The Canadian Treasury Board example takes a step along that path > in > >> shifting > >> > from one binary "perfection"/"imperfection" statement to a two > >> tiered, > >> > percentage collection of 38 binary "perfection"/"imperfection" > >> statements. > >> > But we need to go further than that. > >> > > >> > I think the components of such a successful evaluation will need > to: > >> > • Recognize (as EvalTF is already doing) that only a > sampling/subset > >> of > >> > everything that a user can encounter can be effectively evaluated > in > >> a > >> > finite and reasonable amount of time > >> > • Provide greater granularity in the evaluation reporting - one > that > >> is > >> > designed to accommodate the reality of imperfect software while > >> nonetheless > >> > providing useful information to those consuming the evaluation > report > >> such > >> > that they can make informed decisions based on it > >> > • Incorporate the concepts (as EvalTF is starting to do) of uses > (or > >> use > >> > cases) of the application so that the evaluation is meaningful in > the > >> > context of how the web application will be used > >> > > >> > I am eager to get further into these discussions in EvalTF, some > of > >> which > >> > may be logical things to discuss as we review feedback from the > >> public draft > >> > (including some of the Oracle feedback... :-). And as I > mentioned, > >> we've > >> > already started exploring some of this already. > >> > > >> > > >> > Peter > >> > > >> > > >> > On 5/22/2012 2:09 PM, Shadi Abou-Zahra wrote: > >> > Hi Peter, > >> > > >> > Does that mean that web applications cannot be evaluated? > >> > > >> > Best, > >> > Shadi > >> > > >> > > >> > On 22.5.2012 20:40, Peter Korn wrote: > >> > > >> > Shadi, > >> > > >> > As is clear from the Notes& Examples under their definition of > "Web > >> page" > >> > at > >> > the bottom of the URL you circulated (below), it is clear they are > >> looking > >> > to > >> > assess on a Pass/Fail basis the full complexity of web > applications. > >> As > >> > we've > >> > explored in recent EvalTF meetings, that is a very challenging > thing > >> to do, > >> > given how dynamic web applications can be (cf. their examples of a > >> "Web mail > >> > > >> > program" and a "customizable portal site"). It is challenging in > >> normal > >> > software > >> > testing to determine whether you have reached every possible code > >> path& > >> > every > >> > possible configuration of the structure behind a single URI, let > >> alone > >> > answer > >> > Pass/Fail for each and every WCAG A/AA SC for those. > >> > > >> > > >> > Regards, > >> > > >> > Peter > >> > > >> > On 5/22/2012 6:10 AM, Shadi Abou-Zahra wrote: > >> > > >> > Dear Group, > >> > > >> > Ref:<http://www.tbs-sct.gc.ca/ws-nw/wa-aw/wa-aw-assess-methd- > >> eng.asp> > >> > > >> > David MacDonald pointed out the accessibility assessment > methodology > >> of the > >> > > >> > Canadian Treasury Board, in particular the scoring they use. > >> > > >> > Best, > >> > Shadi > >> > > >> > -- > >> > Oracle<http://www.oracle.com> > >> > Peter Korn | Accessibility Principal > >> > Phone: +1 650 506 9522<tel:+1%20650%20506%209522> > >> > Oracle Corporate Architecture Group > >> > 500 Oracle Parkway | Redwood City, CA 94065 > >> > ------------------------------------------------------------------ > --- > >> ------- > >> > ---- > >> > Note: @sun.com e-mail addresses will shortly no longer function; > be > >> sure to > >> > use: > >> > peter.korn@oracle.com to reach me > >> > ------------------------------------------------------------------ > --- > >> ------- > >> > ---- > >> > Green Oracle<http://www.oracle.com/commitment> Oracle is > committed > >> to > >> > developing practices and products that help protect the > environment > >> > > >> > > >> > -- > >> > > >> > Peter Korn | Accessibility Principal > >> > Phone: +1 650 506 9522 > >> > Oracle Corporate Architecture Group > >> > 500 Oracle Parkway | Redwood City, CA 94065 > >> > ________________________________________ > >> > Note: @sun.com e-mail addresses will shortly no longer function; > be > >> sure to > >> > use: peter.korn@oracle.com to reach me > >> > ________________________________________ > >> > Oracle is committed to developing practices and products that help > >> protect > >> > the environment > >> >
Received on Wednesday, 23 May 2012 08:54:32 UTC