Re: some initial questions from the previous thread from Detlev Fischer on 2011-08-24 (public-wai-evaltf@w3.org from August 2011)

From: Detlev Fischer <fischer@dias.de>
Date: Wed, 24 Aug 2011 16:18:12 +0200
To: public-wai-evaltf@w3.org
Message-ID: <4E550824.4070609@dias.de>
Am 24.08.2011 15:34, schrieb Boland Jr, Frederick E.:
> Some other possible questions: Does an evaluation methodology
> necessarily involve a user carrying out a predefined task
> involving websites?  What exactly are we evaluating against
> (how do any business rules, mission definition/completion
> requirements, etc. influence an evaluation - "context" of
> evaluation)?  Do we need any formalisms or ontologies to
> adequately express any evaluation parameters/context information?
>
> Thanks and best wishes
> Tim Boland

I think for many checks, defining tasks is unnecesssary. If you check 
for headings, keyboard access, alt texts and most other things, you just 
investigate all fitting instances on the entire page (and also, 
complication, in dynamically generated / displayed content).

However, some checks of SC, for example in Guidelines 3.2 Predictable 
and 3.3 Input Assistance, need the definition of tasks / processes that 
should be documented to be reproducible.

Regarding your question "what are we evaluating against" this could be a 
layer separate from the evaluation itself. Governments or businesses may 
require Level AA or just Level A or combine either with additional 
requirements (e.g., usability as in the Dutch Dremperlvrij scheme, 
http://www.drempelvrij.nl/ ), require corrective action within 
particular time horizons, etc.

>
> PS - apologies in advance if these questions have already been answered..
>
>
> -----Original Message-----
> From: public-wai-evaltf-request@w3.org [mailto:public-wai-evaltf-request@w3.org] On Behalf Of Shadi Abou-Zahra
> Sent: Monday, August 22, 2011 7:35 AM
> To: Eval TF
> Subject: some initial questions from the previous thread
>
> Dear Eval TF,
>
>   From the recent thread on the construction of WCAG 2.0 Techniques, here
> are some questions to think about:
>
> * Is the "evaluation methodology" expected to be carried out by one
> person or by a group of more than one persons?

If carried out by more than one, the aggregation of separate results may 
not be made part of the methodology.
>
> * What is the expected level of expertise (in accessibility, in web
> technologies etc) of persons carrying out an evaluation?

Hard to say: certainly working knowledge of HTML/CSS and web design and 
a good knowledge of a11y issues and WCAG. Expert scripting knowledge 
should, I hope, not be required although it gets harder these days with 
so much dynamic content being written to pages. Working knowledge of 
screen readers raises the bar a lot but may be increasingly necessary to 
test things like the success of WAI-ARIA implementations.
>
> * Is the involvement of people with disabilities a necessary part of
> carrying out an evaluation versus an improvement of the quality?

While always beneficial in practical terms, involving users of AT in 
conformance testing creates the problem that many things work 
differently across the many combinations of UA and AT versions and also, 
custom settings of AT. So it will get very hard to manage, and difficult 
to draw conclusions that are valid for a broad range of implementationms 
in the field. I feel everything that can be tested with a manageable and 
free set of browsers and tools should be tested that way. But I realise 
that more and more things escape such tests and require practical tests 
with AT.
>
> * Are the individual test results binary (ie pass/fail) or a score
> (discrete value, ratio, etc)?

If an individual test result refers to an instance tested on a page I 
believe that simply doing the sums of all instances per SC or individual 
test within the SC will often lead to distorted results.
Think of 20 teaser images with perfect alt text, one critical linked 
image (say in the main navigation) without. On the aggregated level of 
page, I believe a range is necessary. Whether this is  percent or 
discrete steps or whatever seems secondary.
>
> * How are these test results aggregated into an overall score (plain
> count, weighted count, heuristics, etc)?

I think weighting for criticality is necessary.
>
> * Is it useful to have a "confidence score" for the tests (for example
> depending on the degree of subjectivity or "difficulty")?

I agree with Richard Warren's sentiment that a confidence score would 
get overly complicated. Our current answer to this is arbitration of two 
independent results.
>
> * Is it useful to have a "confidence score" for the aggregated result
> (depending on how the evaluation is carried out)?

At least it would make sense to flag limitations (e.g., a reduced page 
sample, or, in our case, the test being conducted by just one tester)
>
>
> Feel free to chime in if you have particular thoughts on any of these.
>
> Best,
>     Shadi
>


-- 
---------------------------------------------------------------
Detlev Fischer PhD
DIAS GmbH - Daten, Informationssysteme und Analysen im Sozialen
Geschäftsführung: Thomas Lilienthal, Michael Zapp

Telefon: +49-40-43 18 75-25
Mobile: +49-157 7-170 73 84
Fax: +49-40-43 18 75-19
E-Mail: fischer@dias.de

Anschrift: Schulterblatt 36, D-20357 Hamburg
Amtsgericht Hamburg HRB 58 167
Geschäftsführer: Thomas Lilienthal, Michael Zapp
---------------------------------------------------------------
Received on Wednesday, 24 August 2011 14:18:36 UTC