Comments on WCAG-EM from Kerstin Probiesch on 2014-03-01 (public-wai-evaltf@w3.org from March 2014)

From: Kerstin Probiesch <k.probiesch@gmail.com>
Date: Sat, 1 Mar 2014 11:29:29 +0100
To: <public-wai-evaltf@w3.org>
Message-ID: <001101cf3539$214d8450$63e88cf0$@gmail.com>
Dear Eval TF members,

first I want to say sorry for the one-day-delay. I was very busy in the last
weeks. I hope the Task Force will be so kind to however consider my
comments.

My comments are guided from the fact that WCAG-EM is more than "just" a
paper for further debate and discussion about Accessibility Metrics. When we
just look at WCAG-EM from the W3C-Perspective WCAG-EM is informative and a
Working Group Note. For European Countries WCAG-EM is even more. WCAG-EM is
part of WAI-Act, a European Commisssion project and WCAG-EM is mentioned in
the document "Accessibility requirements for public procurement of ICT
products and services in Europe" on the following pages: 90, 92, 94:

"Where it is not manageable to check every Web page that is provided to the
user, then an appropriate methodology can be used to assess the overall
conformance of the web content. A methodology for evaluating the conformance
of websites to WCAG 2.0 is under development by W3C and is available at:
http://www.w3.org/TR/WCAG-EM/"

Even while it is written that WCAG-EM is  "a methodology" and not "the
methodoloy" it is likely that WCAG-EM will be under European Conditions more
than just informative especially in the context of Public Procurements.

In general I like the sampling section. Thanks. Thanks also for all the work
the Task Force did far. I believe that a methodology like WCAG-EM has the
potential to bring in more harmonization in evaluating websites.

Now my comments on W3C Working Draft 30 January 2014.

## Website Accessibility Conformance Evaluation Methodology (WCAG-EM) 1.0 -
W3C Working Draft 30 January 2014

I appreciate that it is called 1.0 cause it indicates that the work is not
finished yet. In the same time the "1.0" gives the methodology a formal
character. When we look at other Working Group Notes cases with "1.0" are
rare. Therefore the "1.0" could likely be misunderstood as Recommendation. 

Suggestion: I would prefer something like "Alpha Version".

## Purposes for this Methodology

This version: "This methodology is designed for anyone who wants to follow a
common procedure for evaluating the conformance of websites to WCAG 2.0."

Comment: "common procedure" is not defined in the document it is therefore a
bit unclear to me what the term "common" means in this context. Because
WCAG-EM is a new methodology and because there was not a test run until now
the term is misleading. 

Suggestion: Delete "common" and write: "This methodology is designed for
anyone who is evaluating websites."

## Review Teams (Optional)

This Version: "However, using the combined expertise of review teams
provides broader coverage of the required skills and helps identify
accessibility barriers more effectively. While not required for using this
methodology, the use of review teams is recommended when performing an
evaluation of a website".

At first sight it sounds logically that a team is better than an individual
tester. Let's have a second look: We don't have any studies which are
suitable to either support or refute the thesis that a review teams are
_always_ (which is indicated by the sentence) working more effectively. 

Wether a review team will find more and is therefore effectively depends on
various factors. I expect that a review team where each member of the team
has long time experience in evaluating accessibiltiy may find more. And I
believe that a single tester with a lot experience may find more than a
review team with less experience. A review team is also not acting
independent from other external factors like budget, time and so on. We
don't have any studies on this issue therefore it is a thesis but the
wording indicates to me that it is a proven fact. The referenced document
explains this issue and gives additional information but provides also no
proven data about this and is therefore, sorry, a bit self-referencial.

Suggestion: Make clear that it is a thesis.

## Involving Users (Optional)

This Version: "Involving people with disabilities including people with
aging-related impairments helps identify additional accessibility barriers
that are not easily discovered by expert evaluation alone."

When we read the section "Review Teams" and the section "Involving Users" in
relation one can get the impression that "Users" is a strictly different
thing  than "experts ". If two evaluators are evaluating a website and one
of the evaluators is disabled and the other one not is it then a Review Team
or an individual evaluator who has involved a disabled user?

Suggestion: Give more explanation and definitions about expert evaluation,
involving users and review teams.

## Step 1b

This Version: "Note: It is often useful to evaluate beyond the conformance
target ".

First Suggestion: Change"often" to "always". Especially when just level A is
achieved by the website owner it don't take much time for evaluators to
check SCs like 1.4.3 (Contrast Minimum), 1.4.5 (Images of Text), 2.4.5
(Multiple Ways), 2.4.7 (Focus visible) and comment them in the report. Just
an example: An evaluator checks SCs like "Keyboard" and "No Keyboard Trap"
the Evaluator will see if the Focus is visible or not. 

There are cases thinkable where Level-AA-SCs are met, but one or two
Level-A-SCs are not met in full. A correction of probably just some A-SCs
could than lead to AA-Conformance even when just A was achieved. This I
believe is intended by note 1 of "Conformance Level": "Note 1: Although
conformance can only be achieved at the stated levels, authors are
encouraged to report (in their claim) any progress toward meeting success
criteria from all levels beyond the achieved level of conformance."

Second suggestion:  Of course it would take much more time (and budget) if
only A is achieved by the website owner and an evaluator checks also
AAA-SCs. My suggestion is therefore that the SCs of the next highest level
are always useful to evaluate. Especially when only A is achieved. (I hope
this makes sense).

## Step 1d

This Version: "W3C/WAI provides a set of publicly documented (non-normative)
Techniques for WCAG 2.0 that help evaluate conformance to WCAG 2.0 Success
Criteria." 

I think there is still a disparity betweetn WCAG-EM and the reference
(Understanding techniques), because the Understanding Document says: "The
tests are only for a technique, they are not tests for conformance to WCAG
success criteria."

After the above cited sentence of WCAG-EM says: "Some evaluators might use
other methods". This sentence implies that "most" evaluators are using the
WCAG-Techniques. Despite of wether this is fact or not it could be
understood in a way that if "just some evaluators" might use other methods
it is better to use the Techniques, because "most" are using them.

Despite from all the efforts made especially on this section in comparison
to older versions of WCAG-EM I feel it is still a bit misleading. I want to
give an example: When an evaluator is checking a PDF-File relying on the
PDF-Techniques is not sufficient and very limited.

I'm missing also a reference to "What would be the negative consequences of
allowing only W3C's published techniques to be used for conformance to WCAG
2.0?" (http://www.w3.org/WAI/WCAG20/wcag2faq.html#techsnot).

Suggestion:  Add a link to the document "What would be the negative
consequences...." and add a negative example for relying just on the
techniques document while evaluating web content. Delete "Some evaluators
might use other methods" and write "You are also free to use other methods".


## Step 1e
Typo: evalaute -> evaluate

## Step 3

This Version: "In cases where it is feasible to evaluate all web pages, this
sampling procedure can be skipped and the selected sample is considered to
be the entire website in the remaining steps of the conformance evaluation
procedure."

"feasible" is a critical term. Wether something is feasible or not depends
on several variables (time, money, the amount of single pages are just some
aspects which decides wether something is "feasible" or not) Therefore
"feasibility" is also not one of the major quality criteria for tests in
general.

Suggestion:  I believe that this pararaph and especially the term "feasible"
needs further discussion. I am aware that something like "feasible" is
needed - especially when it comes to very huge websites. In the same time I
have a problem with this term because it is a specific term in the area of
test theory. Probably it would be enough to give guidance for
operationalisation and: I am not comfortable that I don't have a better
suggestion in the moment of time.

Please add also "(which is highly recommended)"  after "evaluate all web
pages".
 
## Step 5d

This version: "While aggregated scores provide a numerical indicator to help
communicate progress over time, there is currently no single widely
recognized metric that reflects the required reliability, accuracy, and
practicality."

Because there is no scoring system which fulfill quality criteria like
reliability (I'm missing also objectivity and validity) I think that 5d
should not even be an optional step. In general I think that scores are
misleading and I believe this is also true for the suggested aggregated
score in WCAG-EM. A numerical value "X points" raises automatically the
question: "What the meaning of "X points"? If all is met except captions for
videos the score is very high. One might think that because of the good
score it is also a good accessibility, which would not be true for hard of
hearing and deaf people. 

Especially in the above mentioned european context a scoring system as part
of WCAG-EM should also be underpinned by proven data. 

Suggestion: Drop the whole "Scoring Section" except the sentence:

"While aggregated scores provide a numerical indicator to help communicate
progress over time, there is currently no single widely recognized metric
that reflects the required reliability, accuracy, and practicality."

## Missing Issues

WCAG-EM says nothing about the rating system: pass/fail or "anything" else?
This, I think, is very critical because the results of different rating
systems will not not be comparable. 

Suggestion: Provide pass/fail as rating system for evaluation.

Kind regards and again thanks a lot for all the work and time.

Kerstin Probiesch



--------------------------------------------------------------------
Kerstin Probiesch - Freie Beraterin
Barrierefreiheit, Social Media, Projektleitung 
Kantstraße 10/19 | 35039 Marburg
Tel.: 06421 167002
E-Mail: mail@barrierefreie-informationskultur.de
Web: http://www.barrierefreie-informationskultur.de
XING: http://www.xing.com/profile/Kerstin_Probiesch
Received on Saturday, 1 March 2014 10:29:53 UTC