Re: AW: Requirements draft from fischer@dias.de on 2011-09-12 (public-wai-evaltf@w3.org from September 2011)

From: <fischer@dias.de>
Date: Mon, 12 Sep 2011 16:57:52 +0200
To: public-wai-evaltf@w3.org
Message-ID: <20110912165752.15706yxz27vpm31s@webmail.dias.de>
Maybe this discussion isn't that productive right now. Let me just  
clarify that in my view, approaching replicability would require being  
pretty prescriptive about what tools to use and how to rate results -  
quite the opposite of "wishy-washy". We actually do that in our test  
and still never get the same results. It has dawned on me that the  
reason for my insistence might be that maybe no one else in this group  
conducts the same test on the same page sample with two different,  
independent testers...

Finally, I am happy with a requirement "R04 Reliability" if it should  
replace R04 Replicability. I agree with Kerstin that our methodology  
should aim to make results "reliable enough" (which I guess means that  
it manages variance at some point). But I'll be schtum for now and let  
others have their go...

Detlev


Quoting RichardWarren <richard.warren@userite.com>:

> Hi Both, and all,
> I am very concerned that if we start off with "wishy-washy"  
> requirements we will not be able to deliver a standardised  
> methodology. I believe our task is to create clear and replicable  
> methods. If, by doing so we set the bar too high then (and only  
> then) we can explore "tolerance" and "scoping" etc. Remember it is  
> the Guidelines we are evaluating, not the success criteria (SC). No  
> doubt our methods will employ SCs and we will have to work out how  
> to cope with (aggregate etc.) qualitative results.
>
> I agree with Kirsten that *R03 and *R04 are vital and should not be dropped.
>
> Richard
>
> -----Original Message----- From: Kerstin Probiesch
> Sent: Monday, September 12, 2011 2:33 PM
> To: 'Detlev Fischer' ; public-wai-evaltf@w3.org
> Subject: AW: Requirements draft
>
> Hi again Detlev, all,
>
> I don't get the point. Pat of a methodology is always managing  
> variance. Especially for that we need to follow the three main  
> Criteria of Quality for tests. Reliability has different levels (I'm  
> not sure if this is the correct word for what I mean), e.g. high,  
> low, no and of course we deal always with the question: is it  
> "reliable enough"? This just for this Criteria. I think this was  
> meant by "within a given tolerance", which *is* managing variance.
>
> Kerstin
>
>> -----Ursprüngliche Nachricht-----
>> Von: public-wai-evaltf-request@w3.org [mailto:public-wai-evaltf-
>> request@w3.org] Im Auftrag von Detlev Fischer
>> Gesendet: Montag, 12. September 2011 14:16
>> An: public-wai-evaltf@w3.org
>> Betreff: Re: Requirements draft
>>
>> Hi Kerstin, hi everyone else,
>>
>> my point was simply that empirically, I think replicability of
>> something
>> as complex as evaluating a website against WCAG 2.0 will be the
>> exception even in the best of circumstances. That does not mean I am
>> against trying to define a common basis (a methodology for testing).
>> However, I believe our methodology cannot and should not be as
>> deterministic as a standard like HTML or CSS. Every process involving a
>> large amount of human experience and contextual judgement will produce
>> variance in results. I think the methodology should *manage* rather
>> than
>> will away this variance, e.g., by coming up with credible ways of
>> aggregating / arbitraiting / validating human testing results. So I
>> believe looking critically at requirements for unique interpretation
>> and
>> replicability is the exact opposite of being content with "Tipps for
>> testing" - it actually raises the bar by checking theory against the
>> reality of testing complex, real-word sites.
>>
>> Detlev
>>
>> Am 12.09.2011 13:47, schrieb Kerstin Probiesch:
>>> Hi Detlev, all,
>>>
>>> I commented already most of the suggested Requirements. Just a few
>> words as a comment to Detlev's comments and just for two Requirements.
>> Please see the other comments from Detlev in his mail and my other
>> comments also (in a few day). Sorry for going this was, but I want to
>> comment two very important points in one paragraph.
>>>
>>> If we would drop R04 we would fail in the minimum one international
>> Criteria for the quality of tests in general: Reliability. To drop R03
>> is critical for the second Criteria for the quality of tests:
>> Objectivity. Without Reliability no Validity which is the third
>> important Criteria. If just one Criteria fails the W3C can't claim the
>> evaluation methodology as standardized. The result of our work will be
>> a *non-standardized* evaluation methodology as a Recommendation coming
>> from W3C as main international *standards* organization. I fear the
>> result of our work will then have the character of some "Tipps for
>> testing".
>>>
>>> Kerstin
>>>
>>>>> R03: Unique interpretation
>>>>> Comment (RW) : I think this means that it should be unambiguous,
>> that
>>>>> means it is not open to different interpretations. I am pretty sure
>>>> that the W3C has a standard clause it uses to cover this point when
>>>> building standards etc. Hopefully Shadi can find it<Grin>  . This
>> also implies
>>>>> use of standard terminology which we should be looking at as soon
>> as
>>>>> possible so that terms like “atomic testing” do not creep into our
>>>>> procedures without clear /agreed definitions.
>>>>
>>>> DF: I have spent some time arguing that the testing of many SC is
>> not a
>>>> black&  white thing (1.3.1 headings, 1.1.1 alt text, etc),
>> especially
>>>> if we aggregate results for all "atomic" (sorry) instances on a page
>> level
>>>> and use the page as unit to be evaluated. I have not seen much
>> reaction
>>>> to that by others so far.
>>>> I would drop R03 as unrealistic.
>>>
>>>>> R04: Replicability: different Web accessibility evaluators who
>>>> perform
>>>>> the same tests on the same site should get the same results within
>> a
>>>>> given tolerance.
>>>>> Comment (RW) : The first part is good, but I am not happy with
>>>>> introducing “tolerance” at this stage. I think we should be clear
>>>> that we are after consistent, replicable tests. I think we should
>> add
>>>>> separate requirement later for such things as “partial compliance”
>>>> and “tolerance. See R14 below.
>>>>>
>>>>> *R04: Replicability: different Web accessibility evaluators who
>>>> perform
>>>>> the same tests on the same site should get the same results.
>>>>
>>>> DF: I think I know this will never happen UNLESS people use the same
>>>> closely defined step-by-step process AND have a common / shared
>>>> understanding as to what constitutes a failure or success across a
>>>> range of different implementations. Even then, exact replicability
>> will be
>>>> the exception. If the method we aim for should be generic and there
>> is no element of
>>>> arbitraiton between testers and no validation by a (virtual)
>> community,
>>>> no chance of replicability, im my opinion.
>>>> I would drop R04 as unrealistic.
>>>
>>>
>>
>>
>> --
>> ---------------------------------------------------------------
>> Detlev Fischer PhD
>> DIAS GmbH - Daten, Informationssysteme und Analysen im Sozialen
>> Geschäftsführung: Thomas Lilienthal, Michael Zapp
>>
>> Telefon: +49-40-43 18 75-25
>> Mobile: +49-157 7-170 73 84
>> Fax: +49-40-43 18 75-19
>> E-Mail: fischer@dias.de
>>
>> Anschrift: Schulterblatt 36, D-20357 Hamburg
>> Amtsgericht Hamburg HRB 58 167
>> Geschäftsführer: Thomas Lilienthal, Michael Zapp
>> ---------------------------------------------------------------
>
>
>
>
Received on Monday, 12 September 2011 14:58:25 UTC