Re: Requirements draft from Detlev Fischer on 2011-09-12 (public-wai-evaltf@w3.org from September 2011)

From: Detlev Fischer <fischer@dias.de>
Date: Mon, 12 Sep 2011 14:15:42 +0200
To: public-wai-evaltf@w3.org
Message-ID: <4E6DF7EE.1030109@dias.de>
Hi Kerstin, hi everyone else,

my point was simply that empirically, I think replicability of something 
as complex as evaluating a website against WCAG 2.0 will be the 
exception even in the best of circumstances. That does not mean I am 
against trying to define a common basis (a methodology for testing). 
However, I believe our methodology cannot and should not be as 
deterministic as a standard like HTML or CSS. Every process involving a 
large amount of human experience and contextual judgement will produce 
variance in results. I think the methodology should *manage* rather than 
will away this variance, e.g., by coming up with credible ways of 
aggregating / arbitraiting / validating human testing results. So I 
believe looking critically at requirements for unique interpretation and 
replicability is the exact opposite of being content with "Tipps for 
testing" - it actually raises the bar by checking theory against the 
reality of testing complex, real-word sites.

Detlev

Am 12.09.2011 13:47, schrieb Kerstin Probiesch:
> Hi Detlev, all,
>
> I commented already most of the suggested Requirements. Just a few words as a comment to Detlev's comments and just for two Requirements. Please see the other comments from Detlev in his mail and my other comments also (in a few day). Sorry for going this was, but I want to comment two very important points in one paragraph.
>
> If we would drop R04 we would fail in the minimum one international Criteria for the quality of tests in general: Reliability. To drop R03 is critical for the second Criteria for the quality of tests: Objectivity. Without Reliability no Validity which is the third important Criteria. If just one Criteria fails the W3C can't claim the evaluation methodology as standardized. The result of our work will be a *non-standardized* evaluation methodology as a Recommendation coming from W3C as main international *standards* organization. I fear the result of our work will then have the character of some "Tipps for testing".
>
> Kerstin
>
>>> R03: Unique interpretation
>>> Comment (RW) : I think this means that it should be unambiguous, that
>>> means it is not open to different interpretations. I am pretty sure
>> that the W3C has a standard clause it uses to cover this point when
>> building standards etc. Hopefully Shadi can find it<Grin>  . This also implies
>>> use of standard terminology which we should be looking at as soon as
>>> possible so that terms like “atomic testing” do not creep into our
>>> procedures without clear /agreed definitions.
>>
>> DF: I have spent some time arguing that the testing of many SC is not a
>> black&  white thing (1.3.1 headings, 1.1.1 alt text, etc), especially
>> if we aggregate results for all "atomic" (sorry) instances on a page level
>> and use the page as unit to be evaluated. I have not seen much reaction
>> to that by others so far.
>> I would drop R03 as unrealistic.
>
>>> R04: Replicability: different Web accessibility evaluators who
>> perform
>>> the same tests on the same site should get the same results within a
>>> given tolerance.
>>> Comment (RW) : The first part is good, but I am not happy with
>>> introducing “tolerance” at this stage. I think we should be clear
>> that we are after consistent, replicable tests. I think we should add
>>> separate requirement later for such things as “partial compliance”
>> and “tolerance. See R14 below.
>>>
>>> *R04: Replicability: different Web accessibility evaluators who
>> perform
>>> the same tests on the same site should get the same results.
>>
>> DF: I think I know this will never happen UNLESS people use the same
>> closely defined step-by-step process AND have a common / shared
>> understanding as to what constitutes a failure or success across a
>> range of different implementations. Even then, exact replicability will be
>> the exception. If the method we aim for should be generic and there is no element of
>> arbitraiton between testers and no validation by a (virtual) community,
>> no chance of replicability, im my opinion.
>> I would drop R04 as unrealistic.
>
>


-- 
---------------------------------------------------------------
Detlev Fischer PhD
DIAS GmbH - Daten, Informationssysteme und Analysen im Sozialen
Geschäftsführung: Thomas Lilienthal, Michael Zapp

Telefon: +49-40-43 18 75-25
Mobile: +49-157 7-170 73 84
Fax: +49-40-43 18 75-19
E-Mail: fischer@dias.de

Anschrift: Schulterblatt 36, D-20357 Hamburg
Amtsgericht Hamburg HRB 58 167
Geschäftsführer: Thomas Lilienthal, Michael Zapp
---------------------------------------------------------------
Received on Monday, 12 September 2011 12:16:28 UTC