Re: AW: AW: Goodness criteria from detlev.fischer@testkreis.de on 2012-03-22 (public-wai-evaltf@w3.org from March 2012)

From: <detlev.fischer@testkreis.de>
Date: Thu, 22 Mar 2012 17:53:26 +0100 (CET)
To: public-wai-evaltf@w3.org
Message-Id: <20120322165326.6495F1DC2001@dd24924.kasserver.com>
Hi Kerstin, hi list,

I agree entirely that the assessment of accessibility should not depend on personal interpretation, and that our methodology should strive to minimise the affect of aubjective differences. I think we really all agree on that.

So reliability, validity and objectivity are fine as goals. The main purpose of my earlier mail was to indicate how these terms might be qualified in the context of the WCAG-EM.

In my view, the critical question is to what extent we can nail down specific requirements, say, for reliability, in WCAG-EM.

You can think of different angles on that:

* Requiring a particular evaluator qualification (quite difficult if only for national differences) 
* Requiring a particular level of evaluator experience (but how do we measure that?)
* Defining what the sample must include (mostly done although this might need changes)
* Requiring that a test must be performed independently by more than one tester (this would improve reliability but is costy and will not be mandated by WCAG-EM if I am not mistaken)
* Designing some process to resolve differences in evaluator ratings / assessments (that is the BITV-Test approach)
* Setting some threshold, or offset, for rating differences between independent testers that must be met so that the required level of confidence is met

etc. etc. This will need many more discussions - I really do believe we will return to these goodness criteria and give more substance to them soon enough...and then all will be good :-)

Regards,
Detlev

--
testkreis c/o feld.wald.wiese
Borselstraße 3-7 (im Hof), 22765 Hamburg

Mobil +49 (0)1577 170 73 84
Tel +49 (0)40 439 10 68-3
Fax +49 (0)40 439 10 68-5

http://www.testkreis.de
Beratung, Tests und Schulungen für barrierefreie Websites



----- Original Message -----
From: k.probiesch@googlemail.com
To: shadi@w3.org
Date: 22.03.2012 16:05:44
Subject: AW: AW: Goodness criteria


> Hi Shadi,
> 
> I try my best.
> 
> The mentioned article is about goodness criteria in qualitative studies. (there is a long going dispute over methods between researchers who are doing quantitative research and researchers who are doing qualitative research). One can for sure discuss if those goodness criteria should have that much relevance in _qualitative_ research.
> 
> But: evaluating websites is not qualitative. Evaluating websites belongs to the quantitative field and in the quantitative field there is no question and no discussion at all about the relevance of reliability, objectivity and validity.
> 
> Sorry, it seems today my english is more worse than every before.
> 
> Best
> 
> Kerstin
> 
> 
> 
> 
> 
> 
> 
> 
>> -----Ursprüngliche Nachricht-----
>> Von: Shadi Abou-Zahra [mailto:shadi@w3.org]
>> Gesendet: Donnerstag, 22. März 2012 15:02
>> An: Kerstin Probiesch
>> Cc: public-wai-evaltf@w3.org
>> Betreff: Re: AW: Goodness criteria
>> 
>> Hi Kerstin,
>> 
>> I must admit that I have difficulty understanding your specific
>> suggestion or request, despite having read it several times.
>> 
>> Would you mind rephrasing your comment more clearly?
>> 
>> Thanks,
>>    Shadi
>> 
>> 
>> On 22.3.2012 14:32, Kerstin Probiesch wrote:
>> > Hi Detlev, all,
>> >
>> > evaluating websites or pages without standardized methodology is for
>> me nearly worthless. Users and also clients have not only an interest
>> but the right that websites are tested with reliable tests and that
>> accessibility means the same in every country and doesn't depend on the
>> personal interpretation.
>> >
>> > Anyway. The link is very interesting and written from the perspective
>> of qualitative social research: "When it comes to discussing goodness
>> or quality criteria of the (qualitative) social sciences…". The article
>> was published at Forum: Qualitative Social Research and is I believe
>> part of the apologetic scientific literature of qualitative researchers
>> in the context of the dispute over methods between scientists doing
>> quantitative and those doing qualitative research.
>> >
>> > Nothing is "wrong" with that. Qualitative research is about people
>> and typical methods are interviews (narrative, problem-centered,..) for
>> example in the case of ethnographic field studies where great
>> researchers like Malinowski did fundamental research – especially on
>> Participant observation.
>> >
>> > Very interesting would be results of qualitative interviews with web
>> developers about why some are advocating accessibility and others not.
>> Or how they see their own knowledge about accessibility.
>> >
>> > But: evaluating websites is not qualitative social research.
>> >
>> > Kerstin
>> >
>> >> -----Ursprüngliche Nachricht-----
>> >> Von: detlev.fischer@testkreis.de
>> [mailto:detlev.fischer@testkreis.de]
>> >> Gesendet: Donnerstag, 22. März 2012 12:40
>> >> An: public-wai-evaltf@w3.org
>> >> Betreff: Goodness criteria
>> >>
>> >> Hi list,
>> >>
>> >> just a few words about Kerstin's request to bring goodness criteria
>> >> into the section 1.1 on scope.
>> >>
>> >> I#m not sure what this inclusion will add for those applying the
>> >> methodology when conducting tests (or defining test procedures for
>> >> others to follow).
>> >>
>> >> Here are my 2 cents on the three terms objectivity, validity,
>> >> reliability:
>> >>
>> >> Objectivity
>> >> Normally this refers to minimising individual (inter-evaluator)
>> >> differences in observation or judgement.
>> >> While we can objectively measure temperature, dimensions, etc. based
>> on
>> >> normative scales, there are several factors that make objectivity
>> >> little more than an ideal that can be approached but never reached
>> in
>> >> website evaluation. Several aspects contribute to that:
>> >>
>> >> 1. Evaluators have different backgrounds and dispositions. One can
>> try
>> >> to minimise
>> >> these differences by uniform curricula and training, and in
>> dialogues
>> >> aimed at a
>> >> consensual adjustment of judgements in typical cases.
>> >>
>> >> 2. Web content out there is complex and often fails to fit the
>> patterns
>> >> described in
>> >> documented techniques. There is nothing we can do about that :-)
>> >>
>> >> 3. The rating of Success Criteria is often not stricty independent
>> of
>> >> other SC.
>> >> Instances can fail several SC at the same time, and context must be
>> >> taken into
>> >> account to judge instances. How that is done will often vary across
>> >> evaluators.
>> >>
>> >> Validity
>> >> The validity of an evaluation is ultimately the degree to which an
>> >> evaluation result reflects the actual degree of accessibility across
>> >> users with disabilities. So there is a strong temporal element here.
>> >> The validiy of assessments will depend, for example, on the current
>> >> degree of accessibility support of techniques used to claim
>> >> conformance. As the web changes and relevant accessibility
>> techniques
>> >> change with it, maintaining validity means maintaining the
>> timeliness
>> >> and relevance of the techniques and failures that operationalize the
>> >> general success criteria (or, if a tester wants to avoid any
>> reference
>> >> to documented techniques, maintaining the knowledge of what is
>> currenty
>> >> supported and what is not, or not yet).
>> >> As WCAG-EM just references techniques maintained outside its scope,
>> I
>> >> wonder whether it is the right place to cover validity.
>> >>
>> >> Reliability
>> >> Reliability seems to depend on several aspects:
>> >>
>> >> 1. the knowledge, diligence and amout of time invested by the
>> >> individual evaluator
>> >> across all relevant steps
>> >>
>> >> 2. the degree of operationalization: the more prescriptive the test
>> >> procedure, the
>> >> higher the likelihood of replicability. As WCAG-EM will not (for
>> good
>> >> reasons)
>> >> go into detail regarding tools or particular procedures based on
>> tools,
>> >> I doubt
>> >> that WCAM-EM alone can safeguard replicability (which might be the
>> job
>> >> of more
>> >> prescriptive procedures based on it)
>> >>
>> >> 3. The amount of testers carrying out the same test (re-test,
>> >> replicate) or the
>> >> availability of additional quality assurance - again something
>> probably
>> >> to be
>> >> defined beyond the scope of WCAG-EM
>> >>
>> >> As a last comment, I am not convinced that "goodness criteria are
>> >> defined and internationally agreed in the scientific community"
>> means
>> >> that these are a given that can simply be referenced and taken for
>> >> granted. This may be true for hard sciences, but an evaluation is
>> >> subject to many 'soft' social and contextual aspects. One should aim
>> to
>> >> keep these in check, but it is impossible to eliminate them
>> entirely.
>> >> Instead, they must be managed. Perhaps  this article has some useful
>> >> pointers:
>> >>
>> >> http://www.qualitative-
>> research.net/index.php/fqs/article/view/919/2008
>> >>
>> >> Conclusion
>> >> Why I think mentioning the goodness criteria in the section on scope
>> >> probably does no harm, I am not convinced that this will improve the
>> >> way WCAG-EM is used. It could be useful, however, to give guidance
>> on
>> >> how to approach or improve the aims of objectivivity, validity,
>> >> reliability in practical terms. Whether such guidance can be
>> >> prescriptive for operational procedures based on WCAG-EM, I am not
>> so
>> >> sure about. Let's dicuss...
>> >>
>> >> Best regards,
>> >> Detlev
>> >>
>> >> --
>> >> testkreis c/o feld.wald.wiese
>> >> Borselstraße 3-7 (im Hof), 22765 Hamburg
>> >>
>> >> Mobil +49 (0)1577 170 73 84
>> >> Tel +49 (0)40 439 10 68-3
>> >> Fax +49 (0)40 439 10 68-5
>> >>
>> >> http://www.testkreis.de
>> >> Beratung, Tests und Schulungen für barrierefreie Websites
>> >>
>> >>
>> >>
>> >
>> >
>> >
>> >
>> 
>> --
>> Shadi Abou-Zahra - http://www.w3.org/People/shadi/
>> Activity Lead, W3C/WAI International Program Office
>> Evaluation and Repair Tools Working Group (ERT WG)
>> Research and Development Working Group (RDWG)
>
Received on Thursday, 22 March 2012 16:53:55 UTC