AW: AW: AW: evaluating web applications (was Re: Canadian Treasury Board accessibility assessment methodology) from Kerstin Probiesch on 2012-05-23 (public-wai-evaltf@w3.org from May 2012)

From: Kerstin Probiesch <k.probiesch@googlemail.com>
Date: Wed, 23 May 2012 10:54:51 +0200
To: <detlev.fischer@testkreis.de>, <peter.korn@oracle.com>, <shadi@w3.org>
Cc: <public-wai-evaltf@w3.org>
Message-ID: <4fbca591.8e77cd0a.1d5c.1876@mx.google.com>
...which means until now you can't offer a proven value for the reliability coefficient for this test.

> -----Ursprüngliche Nachricht-----
> Von: detlev.fischer@testkreis.de [mailto:detlev.fischer@testkreis.de]
> Gesendet: Mittwoch, 23. Mai 2012 10:31
> An: k.probiesch@googlemail.com; peter.korn@oracle.com; shadi@w3.org
> Cc: public-wai-evaltf@w3.org
> Betreff: Re: AW: AW: evaluating web applications (was Re: Canadian
> Treasury Board accessibility assessment methodology)
> 
> Hi Kerstin,
> 
> As expressed in the paper, the statistics function has only recently
> been added. So at the moment, this is an informal assessment which we
> will need to back up once we have more data.
> 
> But this is what we hope to get out of the stats function:
> 
> 1. Tester reliability over time: How much are individual evaluators
> 'off the mark' compared to the final quality-assured result? This could
> show an improvement over time, an interesting metrics to assess the
> level of qualification especially of new and less experienced
> evaluators.
> 
> 2. Inter-evaluator reliability: How close are the results of different
> evaluators assessing the same site / page sample?
> 
> There is likely to be little on test-retest reliability data since
> usually, the sites tested are a moving target - improved based on test
> results. Only rarely the same site is re-tested in a tandem test - this
> usually only happens after a re-launch.
> 
> A fundamental problem in all those statistics is that there is no
> objective benchmark to compare individual rating results against - just
> the arbitrated and quality assured final evaluation result. Given the
> scope of interpretation in accessibility evaluation, we think this lack
> of objectivity is inevitable and in the end, down to the complexity of
> the field under investigation and the degree of human error in all
> evaluation.
> 
> 
> --
> Detlev Fischer
> testkreis c/o feld.wald.wiese
> Borselstraße 3-7 (im Hof), 22765 Hamburg
> 
> Mobil +49 (0)1577 170 73 84
> Tel +49 (0)40 439 10 68-3
> Fax +49 (0)40 439 10 68-5
> 
> http://www.testkreis.de
> Beratung, Tests und Schulungen für barrierefreie Websites
> 
> 
> 
> ----- Original Message -----
> From: k.probiesch@googlemail.com
> To: detlev.fischer@testkreis.de, peter.korn@oracle.com, shadi@w3.org
> Date: 23.05.2012 10:09:44
> Subject: AW: AW: evaluating web applications (was Re: Canadian Treasury
> Board accessibility assessment methodology)
> 
> 
> > Hi Detlev,
> >
> > in the mentioned paper for the Website Accessibility Metrics Online
> Symposium is written: " Our experience shows that the 5 point graded
> rating scale is quite reliable." I think it would be helpful for the
> discussion to know what "quite reliable" exactly means (the value for
> the reliability coefficient).
> >
> > Best
> >
> > Kerstin
> >
> >> -----Ursprüngliche Nachricht-----
> >> Von: detlev.fischer@testkreis.de
> [mailto:detlev.fischer@testkreis.de]
> >> Gesendet: Mittwoch, 23. Mai 2012 09:57
> >> An: k.probiesch@googlemail.com; peter.korn@oracle.com; shadi@w3.org
> >> Cc: public-wai-evaltf@w3.org
> >> Betreff: Re: AW: evaluating web applications (was Re: Canadian
> Treasury
> >> Board accessibility assessment methodology)
> >>
> >> Hi all,
> >>
> >> Perhaps not surprisingly for those who have followed these
> discussions
> >> since summer last year, I disagree with Kerstin's statement "the
> more
> >> granualar the evaluation, the less reliable it is".
> >>
> >> The binary approach produces artefacts because it often forces
> >> evalutors to be either too strict (failing a SC due to minor issues)
> or
> >> too lenient (attesting conformance in spite of such issues).
> >>
> >> We've tried to show the higher fidelity of a graded evaluation
> approch
> >> in our recent paper for the Website Accessibility Metrics Online
> >> Symposium 5 December 2011:
> >>
> >> http://www.w3.org/WAI/RD/2011/metrics/paper7/
> >>
> >>
> >> > Hi Peter, Shadi,
> >> >
> >> > if we would work out "something that is different" from the
> pass/fail
> >> which
> >> > obviously is not compliant with the conformance requirements it
> >> wouldn't be
> >> > an evaluation methodology for WCAG 2.0 anymore. Of course: part of
> >> reality
> >> > is imperfect software. Part of reality are also "imperfect"
> >> developers and
> >> > "imperfect" online editors. The question for me is: if we consider
> >> these
> >> > aspects why then promote for example ATAG? Another problem for me
> is:
> >> the
> >> > more granular evaluations are the less reliable they will be.
> >> >
> >> > Regards
> >> >
> >> > Kerstin
> >> >
> >> >
> >> >
> >> > Von: Peter Korn [mailto:peter.korn@oracle.com]
> >> > Gesendet: Dienstag, 22. Mai 2012 23:24
> >> > An: Shadi Abou-Zahra
> >> > Cc: Eval TF
> >> > Betreff: Re: evaluating web applications (was Re: Canadian
> Treasury
> >> Board
> >> > accessibility assessment methodology)
> >> >
> >> > Shadi,
> >> >
> >> > I don't believe one can make an effective, useful, meaningful
> >> conformance
> >> > claim about many classes of web applications today.  That class
> >> includes
> >> > things like web mail, and many kinds of portal applications
> >> (particularly
> >> > where they only employ a single URI).
> >> >
> >> > I do believe it will be possible to evaluate web applications for
> >> > accessibility - similar to evaluating non-web applications for
> >> accessibility
> >> > - but I expect we will need to do something that is different from
> >> the
> >> > binary "perfection"/"imperfection" of the current conformance
> claim
> >> rubric.
> >> > The Canadian Treasury Board example takes a step along that path
> in
> >> shifting
> >> > from one binary "perfection"/"imperfection" statement to a two
> >> tiered,
> >> > percentage collection of 38 binary "perfection"/"imperfection"
> >> statements.
> >> > But we need to go further than that.
> >> >
> >> > I think the components of such a successful evaluation will need
> to:
> >> > • Recognize (as EvalTF is already doing) that only a
> sampling/subset
> >> of
> >> > everything that a user can encounter can be effectively evaluated
> in
> >> a
> >> > finite and reasonable amount of time
> >> > • Provide greater granularity in the evaluation reporting - one
> that
> >> is
> >> > designed to accommodate the reality of imperfect software while
> >> nonetheless
> >> > providing useful information to those consuming the evaluation
> report
> >> such
> >> > that they can make informed decisions based on it
> >> > • Incorporate the concepts (as EvalTF is starting to do) of uses
> (or
> >> use
> >> > cases) of the application so that the evaluation is meaningful in
> the
> >> > context of how the web application will be used
> >> >
> >> > I am eager to get further into these discussions in EvalTF, some
> of
> >> which
> >> > may be logical things to discuss as we review feedback from the
> >> public draft
> >> > (including some of the Oracle feedback... :-).  And as I
> mentioned,
> >> we've
> >> > already started exploring some of this already.
> >> >
> >> >
> >> > Peter
> >> >
> >> >
> >> > On 5/22/2012 2:09 PM, Shadi Abou-Zahra wrote:
> >> > Hi Peter,
> >> >
> >> > Does that mean that web applications cannot be evaluated?
> >> >
> >> > Best,
> >> >   Shadi
> >> >
> >> >
> >> > On 22.5.2012 20:40, Peter Korn wrote:
> >> >
> >> > Shadi,
> >> >
> >> > As is clear from the Notes&  Examples under their definition of
> "Web
> >> page"
> >> > at
> >> > the bottom of the URL you circulated (below), it is clear they are
> >> looking
> >> > to
> >> > assess on a Pass/Fail basis the full complexity of web
> applications.
> >> As
> >> > we've
> >> > explored in recent EvalTF meetings, that is a very challenging
> thing
> >> to do,
> >> > given how dynamic web applications can be (cf. their examples of a
> >> "Web mail
> >> >
> >> > program" and a "customizable portal site"). It is challenging in
> >> normal
> >> > software
> >> > testing to determine whether you have reached every possible code
> >> path&
> >> > every
> >> > possible configuration of the structure behind a single URI, let
> >> alone
> >> > answer
> >> > Pass/Fail for each and every WCAG A/AA SC for those.
> >> >
> >> >
> >> > Regards,
> >> >
> >> > Peter
> >> >
> >> > On 5/22/2012 6:10 AM, Shadi Abou-Zahra wrote:
> >> >
> >> >  Dear Group,
> >> >
> >> >  Ref:<http://www.tbs-sct.gc.ca/ws-nw/wa-aw/wa-aw-assess-methd-
> >> eng.asp>
> >> >
> >> >  David MacDonald pointed out the accessibility assessment
> methodology
> >> of the
> >> >
> >> >  Canadian Treasury Board, in particular the scoring they use.
> >> >
> >> >  Best,
> >> >  Shadi
> >> >
> >> > --
> >> > Oracle<http://www.oracle.com>
> >> > Peter Korn | Accessibility Principal
> >> > Phone: +1 650 506 9522<tel:+1%20650%20506%209522>
> >> > Oracle Corporate Architecture Group
> >> > 500 Oracle Parkway | Redwood City, CA 94065
> >> > ------------------------------------------------------------------
> ---
> >> -------
> >> > ----
> >> > Note: @sun.com e-mail addresses will shortly no longer function;
> be
> >> sure to
> >> > use:
> >> > peter.korn@oracle.com to reach me
> >> > ------------------------------------------------------------------
> ---
> >> -------
> >> > ----
> >> > Green Oracle<http://www.oracle.com/commitment>  Oracle is
> committed
> >> to
> >> > developing practices and products that help protect the
> environment
> >> >
> >> >
> >> > --
> >> >
> >> > Peter Korn | Accessibility Principal
> >> > Phone: +1 650 506 9522
> >> > Oracle Corporate Architecture Group
> >> > 500 Oracle Parkway | Redwood City, CA 94065
> >> > ________________________________________
> >> > Note: @sun.com e-mail addresses will shortly no longer function;
> be
> >> sure to
> >> > use: peter.korn@oracle.com to reach me
> >> > ________________________________________
> >> > Oracle is committed to developing practices and products that help
> >> protect
> >> > the environment
> >> >
Received on Wednesday, 23 May 2012 08:54:32 UTC