W3C home > Mailing lists > Public > public-wai-evaltf@w3.org > September 2011

AW: Requirements draft

From: Kerstin Probiesch <k.probiesch@googlemail.com>
Date: Mon, 12 Sep 2011 15:33:43 +0200
To: "'Detlev Fischer'" <fischer@dias.de>, <public-wai-evaltf@w3.org>
Message-ID: <4e6e0998.4121df0a.3a04.2d7f@mx.google.com>
Hi again Detlev, all,

I don't get the point. Pat of a methodology is always managing variance. Especially for that we need to follow the three main Criteria of Quality for tests. Reliability has different levels (I'm not sure if this is the correct word for what I mean), e.g. high, low, no and of course we deal always with the question: is it "reliable enough"? This just for this Criteria. I think this was meant by "within a given tolerance", which *is* managing variance. 

Kerstin

> -----Ursprüngliche Nachricht-----
> Von: public-wai-evaltf-request@w3.org [mailto:public-wai-evaltf-
> request@w3.org] Im Auftrag von Detlev Fischer
> Gesendet: Montag, 12. September 2011 14:16
> An: public-wai-evaltf@w3.org
> Betreff: Re: Requirements draft
> 
> Hi Kerstin, hi everyone else,
> 
> my point was simply that empirically, I think replicability of
> something
> as complex as evaluating a website against WCAG 2.0 will be the
> exception even in the best of circumstances. That does not mean I am
> against trying to define a common basis (a methodology for testing).
> However, I believe our methodology cannot and should not be as
> deterministic as a standard like HTML or CSS. Every process involving a
> large amount of human experience and contextual judgement will produce
> variance in results. I think the methodology should *manage* rather
> than
> will away this variance, e.g., by coming up with credible ways of
> aggregating / arbitraiting / validating human testing results. So I
> believe looking critically at requirements for unique interpretation
> and
> replicability is the exact opposite of being content with "Tipps for
> testing" - it actually raises the bar by checking theory against the
> reality of testing complex, real-word sites.
> 
> Detlev
> 
> Am 12.09.2011 13:47, schrieb Kerstin Probiesch:
> > Hi Detlev, all,
> >
> > I commented already most of the suggested Requirements. Just a few
> words as a comment to Detlev's comments and just for two Requirements.
> Please see the other comments from Detlev in his mail and my other
> comments also (in a few day). Sorry for going this was, but I want to
> comment two very important points in one paragraph.
> >
> > If we would drop R04 we would fail in the minimum one international
> Criteria for the quality of tests in general: Reliability. To drop R03
> is critical for the second Criteria for the quality of tests:
> Objectivity. Without Reliability no Validity which is the third
> important Criteria. If just one Criteria fails the W3C can't claim the
> evaluation methodology as standardized. The result of our work will be
> a *non-standardized* evaluation methodology as a Recommendation coming
> from W3C as main international *standards* organization. I fear the
> result of our work will then have the character of some "Tipps for
> testing".
> >
> > Kerstin
> >
> >>> R03: Unique interpretation
> >>> Comment (RW) : I think this means that it should be unambiguous,
> that
> >>> means it is not open to different interpretations. I am pretty sure
> >> that the W3C has a standard clause it uses to cover this point when
> >> building standards etc. Hopefully Shadi can find it<Grin>  . This
> also implies
> >>> use of standard terminology which we should be looking at as soon
> as
> >>> possible so that terms like “atomic testing” do not creep into our
> >>> procedures without clear /agreed definitions.
> >>
> >> DF: I have spent some time arguing that the testing of many SC is
> not a
> >> black&  white thing (1.3.1 headings, 1.1.1 alt text, etc),
> especially
> >> if we aggregate results for all "atomic" (sorry) instances on a page
> level
> >> and use the page as unit to be evaluated. I have not seen much
> reaction
> >> to that by others so far.
> >> I would drop R03 as unrealistic.
> >
> >>> R04: Replicability: different Web accessibility evaluators who
> >> perform
> >>> the same tests on the same site should get the same results within
> a
> >>> given tolerance.
> >>> Comment (RW) : The first part is good, but I am not happy with
> >>> introducing “tolerance” at this stage. I think we should be clear
> >> that we are after consistent, replicable tests. I think we should
> add
> >>> separate requirement later for such things as “partial compliance”
> >> and “tolerance. See R14 below.
> >>>
> >>> *R04: Replicability: different Web accessibility evaluators who
> >> perform
> >>> the same tests on the same site should get the same results.
> >>
> >> DF: I think I know this will never happen UNLESS people use the same
> >> closely defined step-by-step process AND have a common / shared
> >> understanding as to what constitutes a failure or success across a
> >> range of different implementations. Even then, exact replicability
> will be
> >> the exception. If the method we aim for should be generic and there
> is no element of
> >> arbitraiton between testers and no validation by a (virtual)
> community,
> >> no chance of replicability, im my opinion.
> >> I would drop R04 as unrealistic.
> >
> >
> 
> 
> --
> ---------------------------------------------------------------
> Detlev Fischer PhD
> DIAS GmbH - Daten, Informationssysteme und Analysen im Sozialen
> Geschäftsführung: Thomas Lilienthal, Michael Zapp
> 
> Telefon: +49-40-43 18 75-25
> Mobile: +49-157 7-170 73 84
> Fax: +49-40-43 18 75-19
> E-Mail: fischer@dias.de
> 
> Anschrift: Schulterblatt 36, D-20357 Hamburg
> Amtsgericht Hamburg HRB 58 167
> Geschäftsführer: Thomas Lilienthal, Michael Zapp
> ---------------------------------------------------------------
Received on Monday, 12 September 2011 13:31:34 GMT

This archive was generated by hypermail 2.3.1 : Friday, 8 March 2013 15:52:11 GMT