Comments from Fraunhofer FIT to the document "Evaluating Web
Accessibility with Users" (2005-09-02)
This document contains some initial comments to the existing draft of
Evaluating Web
Accessibility with Users. We are aware that this is an early draft,
but we found some key points that should be handled with care (Henrike
Gappa, Gabriele Nordbrock).
In regard to Introduction and Types of User Involvement
"Encouraging ..."
- Promotion of informal tests should be handled with care. First of
all, it is necessary to clearly identify the purpose of the evaluation
and under which circumstances it will be conducted, e.g., constrained
time and money resources as well as availability of possible test
participants. Then it can be decided what evaluation method suits best.
As a support for the reader, different usability methods should be
briefly introduced, so the differences become clear also in regard to
the results obtained, e.g., informal tests with one or two test
participants are really only of informative value. Limitations in
reliability of test results should also be mentioned in regard to user
tests with small test samples (see also section on Findings.).
- Furthermore, a paragraph needs to be added that stresses the
importance of a test design that is well thought through. This also
refers to informal tests. The impression that informal tests with small
test samples can be carried out easily without any background knowledge
should be avoided to ensure useful outcome.
- Part of proper test design is development of adequate and complete
test materials, referring to the design of appropriate questionnaires
(formulating the right questions, choosing the right answering mode),
collection of all data necessary and, if applicable, creation of a test
scenario with standard task etc. It also must be decided in advance how
the data gained will be evaluated (statistics, reports, excerpts of
protocols, etc.) and what criteria for measurements are applied.
Therefore, it should not be underestimated the effort for conducting
such tests, if a substantial result is expected. Finally, at least for
successful formal testing, a pilot study with users different from the
participants of the main test is recommendable for refinement of
procedures and materials of the user study.
In regard to section The User Aspect
"More than screen reader users ..."
- Also in user studies, the widest audience possible should be
included
- The target user group should be particularly considered
In regard to section Understanding [Finding/Results]
"One user not representative of all: ..."
It needs to be assured that test results are of substantial use, so
generally speaking gaining reliable test results should be the goal.
However, depending on the aim of the study and the method chosen, this
might differ. The characteristics of the test participants and the size of
the test sample is tobe determined according to the premises of the user
study (test sample size is an issue yet unsolved: for usability studies a
test sample size of 10-12 seems to be a common understanding, in the field
of psychology, 30 test participants is considered the minimum. In case the
"5 users are enough"-paradigma of Nielsen is used, the drawback must be
considered.
Distinction between usability and accessibility issues: the distinction
between usability and accessibility issues is from our point of view not
feasible, because:
- there is much overlap, for instance see WCAG 2.0, GL 2.5 "Help users
avoid mistakes and make it easy to correct them" is a typical usability
issue)
- many users although not labelled "disabled" have accessibility
problems, e.g., older users who need strong contrast and enlarged font
sizes, people in "handicapping situation", accessing with mobile
devices, etc.
Clearly indicate what the report asserts:
- The way results of the user study are reported/communicated to,
e.g., the Web development department, should be determined in advance.
One possibility could be to provide a list of usability/accessibility
problems with a severity rating as applied in Heuristic evaluation
(cosmetic problem, minor problem, major problem, usability
catastrophe)