Goodness criteria from detlev.fischer@testkreis.de on 2012-03-22 (public-wai-evaltf@w3.org from March 2012)

From: <detlev.fischer@testkreis.de>
Date: Thu, 22 Mar 2012 12:39:42 +0100 (CET)
To: public-wai-evaltf@w3.org
Message-Id: <20120322113942.1E5B62366094@dd24924.kasserver.com>

Hi list,

just a few words about Kerstin's request to bring goodness criteria into the section 1.1 on scope.

I#m not sure what this inclusion will add for those applying the methodology when conducting tests (or defining test procedures for others to follow).

Here are my 2 cents on the three terms objectivity, validity, reliability:

Objectivity
Normally this refers to minimising individual (inter-evaluator) differences in observation or judgement.
While we can objectively measure temperature, dimensions, etc. based on normative scales, there are several factors that make objectivity little more than an ideal that can be approached but never reached in website evaluation. Several aspects contribute to that:

1. Evaluators have different backgrounds and dispositions. One can try to minimise
these differences by uniform curricula and training, and in dialogues aimed at a
consensual adjustment of judgements in typical cases.

2. Web content out there is complex and often fails to fit the patterns described in
documented techniques. There is nothing we can do about that :-)

3. The rating of Success Criteria is often not stricty independent of other SC.
Instances can fail several SC at the same time, and context must be taken into
account to judge instances. How that is done will often vary across evaluators.

Validity
The validity of an evaluation is ultimately the degree to which an evaluation result reflects the actual degree of accessibility across users with disabilities. So there is a strong temporal element here. The validiy of assessments will depend, for example, on the current degree of accessibility support of techniques used to claim conformance. As the web changes and relevant accessibility techniques change with it, maintaining validity means maintaining the timeliness and relevance of the techniques and failures that operationalize the general success criteria (or, if a tester wants to avoid any reference to documented techniques, maintaining the knowledge of what is currenty supported and what is not, or not yet).
As WCAG-EM just references techniques maintained outside its scope, I wonder whether it is the right place to cover validity.

Reliability
Reliability seems to depend on several aspects:

1. the knowledge, diligence and amout of time invested by the individual evaluator
across all relevant steps

2. the degree of operationalization: the more prescriptive the test procedure, the
higher the likelihood of replicability. As WCAG-EM will not (for good reasons)
go into detail regarding tools or particular procedures based on tools, I doubt
that WCAM-EM alone can safeguard replicability (which might be the job of more
prescriptive procedures based on it)

3. The amount of testers carrying out the same test (re-test, replicate) or the
availability of additional quality assurance - again something probably to be
defined beyond the scope of WCAG-EM

As a last comment, I am not convinced that "goodness criteria are defined and internationally agreed in the scientific community" means that these are a given that can simply be referenced and taken for granted. This may be true for hard sciences, but an evaluation is subject to many 'soft' social and contextual aspects. One should aim to keep these in check, but it is impossible to eliminate them entirely. Instead, they must be managed. Perhaps this article has some useful pointers:

http://www.qualitative-research.net/index.php/fqs/article/view/919/2008

Conclusion
Why I think mentioning the goodness criteria in the section on scope probably does no harm, I am not convinced that this will improve the way WCAG-EM is used. It could be useful, however, to give guidance on how to approach or improve the aims of objectivivity, validity, reliability in practical terms. Whether such guidance can be prescriptive for operational procedures based on WCAG-EM, I am not so sure about. Let's dicuss...

Best regards,
Detlev

--
testkreis c/o feld.wald.wiese
Borselstraße 3-7 (im Hof), 22765 Hamburg

Mobil +49 (0)1577 170 73 84
Tel +49 (0)40 439 10 68-3
Fax +49 (0)40 439 10 68-5

http://www.testkreis.de
Beratung, Tests und Schulungen für barrierefreie Websites

Received on Thursday, 22 March 2012 11:40:10 UTC