RE: uses cases for evaluation (and reporting) (was Re: Step 1.b Goal of the Evaluation - Design support evaluation vs. conformance evaluation?) from Velleman, Eric on 2013-07-16 (public-wai-evaltf@w3.org from July 2013)

From: Velleman, Eric <evelleman@bartimeus.nl>
Date: Tue, 16 Jul 2013 13:00:58 +0000
To: Peter Korn <peter.korn@oracle.com>
CC: Detlev Fischer <detlev.fischer@testkreis.de>, Shadi Abou-Zahra <shadi@w3.org>, Eval TF <public-wai-evaltf@w3.org>
Message-ID: <3D063CE533923349B1B52F26312B0A466E49868B@s107ma.bart.local>

Hi Peter, all,

Let me go back a step. Maybe the title of the section 1.b is wrong and should not be 'define the goal of the evaluation' but something like 'extra requirements by evaluation commissioner' (not sure if requirements is the right word).

The reason for this is that in my opinion, the goal of WCAG-EM seems obvious. IMHO the only goal of the full conformance evaluation is a … full conformance evaluation. I think we are defining the minimum requirements for such an evaluation. This means that 'goal' is not the issue here.

The experience I have in the Netherlands is that the real issue is what the customer wants. Does he want more pages (on top of the minimum) evaluated, does he want more or less detail than the defined minimum? In that case, our goal is to 'just' define the minimum requirements for WCAG-EM and indicate how and where there are possibilities to extend that if that is the specific wish of the evaluation commissioner.

So instead of branching out the current section 1.b. we narrow it down to scoping additional wishes by evaluation commissioners that would normally not fall within the minimum requirements for WCAG-EM. This will make the methodology more flexible.

In my opinion use cases as we are discussing them now are not goals of the full conformance evaluation.
- Evaluation commissioners can always ask an evaluator to do more (larger sample etc.). We can address that in the methodology by indicating how and where they could do that.
- Evaluation commissioners may ask the evaluator to phone them if the accessibility of the website is horrible. They can then choose to let the evaluator stop or to go on with it.

Hope this makes more sense.

Eric

=========================
Eric Velleman
Technisch directeur
Stichting Accessibility
Universiteit Twente

Tel: +31 (0)30 - 2398270

Christiaan Krammlaan 2
3571 AX Utrecht

www.accessibility.nl / www.wabcluster.org / www.econformance.eu /
www.game-accessibility.com/ www.eaccessplus.eu

Lees onze disclaimer: www.accessibility.nl/algemeen/disclaimer
Accessibility is Member van het W3C
=========================
________________________________
Van: Peter Korn [peter.korn@oracle.com]
Verzonden: vrijdag 12 juli 2013 19:48
Aan: Velleman, Eric
CC: Detlev Fischer; Shadi Abou-Zahra; Eval TF
Onderwerp: Re: uses cases for evaluation (and reporting) (was Re: Step 1.b Goal of the Evaluation - Design support evaluation vs. conformance evaluation?)

Eric,

Hmmm.... I don't really see how "types of websites" fits into this discussion (and that of the telco). Nor is it necessarily about what stage of overall development a webiste is in. Rather, I think it has to do with the level of "accessibility development" of a website. Very mature websites - e.g. long existing, undergoing little change - may nonetheless be new to accessibility work. Very mature web applications may belong to recently acquired companies who have a more rigorous approach to accessibility. In such cases, "mature" sites may nonetheless undergo a "(1) Status quo evaluation (or legacy site evaluation)".

Peter

On 7/12/2013 8:02 AM, Velleman, Eric wrote:

Dear all,

In the below discussion the proposal is to change the section Goal of the Evaluation. While working on this, it looks to me like this discussion below and in the previous telco is not so much about "goals" but more about "types" of websites related to the development stage they are in (development, existing, legacy). I think the place for this is not in 1.b Define the Goal of the Evaluation [1] but much more in the section about Particular Types of Websites [2]. We could add them there as a fifth category: "Websites in different stages of development"?

We could also add a new section just after the Particular Types of Websites to cover the proposed types.

As they are important for the required detail of reporting, they would then be referenced in section 1.b Define the Goal of the Evaluation. That section will also require a rewrite to be more clear.

Kindest regards,

Eric

Reference:
[1] http://www.w3.org/WAI/ER/conformance/ED-methodology-20130611#step1b
[2] http://www.w3.org/WAI/ER/conformance/ED-methodology-20130611#specialcases

________________________________________
Van: Detlev Fischer [detlev.fischer@testkreis.de<mailto:detlev.fischer@testkreis.de>]
Verzonden: donderdag 20 juni 2013 9:51
Aan: Shadi Abou-Zahra
CC: Peter Korn; Eval TF
Onderwerp: Re: uses cases for evaluation (and reporting) (was Re: Step 1.b Goal of the Evaluation - Design support evaluation vs. conformance evaluation?)

Hi everyone,

I am delighted that Peter and Shadi have picked up the use case oriented description of goals.
Here are just a few thoughts on the four different types of evaluation that seem to take shape now, looking at three aspects:

* WCAG conformance angle
* Use of resulting report
* Scope and sampling

(1) Status quo evaluation (or legacy site evaluation)

This is a look at a legacy site which usually has quite a number of a11y issues - a site that will most likely be replaced by a re-design in a different CMS.

A. WCAG conformance angle: it is known that many issues exist and clear that conformance is far off. Absolutely no intention to state or verify a conformance claim
B. Use of resulting report: The commissioner may need this mainly to decide whether remedial effort of the status quo site is worthwhile, or a complete re-design is the better option. They may also need the (bad) report to prove that the old site is not sufficiently accessible and justify the effort of redesign. In that case, they will need the report less to produce specific suggestions for redesign (because often, the whole design context will change when using a new CMS).
C. Scope and sampling: Sampling can usefully focus on 'worst offender pages', trying to cover all there is in terms of inaccessible content to highlight. The site may be so bad that 'more of the bad stuff' doesn't really add anything even if the sampling is insufficient from a statistical angle.

(2) Development evaluation
This is usually a look at an unfinished re-design by an agency that has (some) knowledge of accessible web design.

A. WCAG conformance angle: The commissioner expects that there will be issues but the site may be approaching a state of (near) conformance - no immediate intention to state or verify a conformance claim (but often preparing a later conformance-oriented evaluation).
B. Use of resulting report: The commissioner intends to produce clear (not necessarily detailed technical) advice on how to improve the a11y of the site being designed
C. Scope and sampling: Often there are pockets of old content (legacy pdf, data feeds with insufficient semantic markup, applications that are linked to but not re-designed at the same time - which puts a strong emphasis on setting the scope of the evaluation. As a conformance claim is not the intended result, it stands to argue that setting the scope and limiting the sample can be done in agreement with the commissioner, with the commissioner having the last say. They may want information on particular aspects, believing (rightly or wrongly) that other parts are fine, or can be excluded in whatever claim they want to make later on. The evaluator may warn the commissioner that the scope will be wider in a follow-up conformance evaluation, or that the sample may miss pages with issues, but at the end of the day it is down to whether the commissioner wants to spend more money on finding more issues or limit the evaluation.

(3) Conformance-oriented evaluation
This is the type of evaluation where a site has been carefully designed with a11y in mind and assumes conformance is in reach. It often (in our experience) follows (2) Development evaluation, after issues have been addressed.

A. WCAG conformance angle: Actual WCAG conformance claims may only be stated per page, or a site-wide claim may state confidence levels as indicated by Peter.
B. Use of resulting report: The report mainly backs up the conformance claim. It may simply 'tick the boxes' for SCs in general or SCs per page, or give more details on issues found per page and SC. The commissioner may want to put the report in the public domain and link to it to prove that an evaluation was carried out and was successful (here, we have the issue of how 'success' is defined - are minor violations found on some pages in the sample acceptable? Is there a process by which remaining issues can be corrected and the correction be verified by the evaluator before a conformance claim (with confidence statement) is published?)
C. Scope and sampling: What we have now in WCAG-EM. Quantity of sampling will relate to the confidence level.

(4) Repeat evaluation, Periodic evaluation, Regression evaluation
Carried out on sites that have undergone (3) Conformance-oriented evaluation after a specified period, or at the initiative of the commissioner

A. WCAG conformance angle: Is a conformance claim (with confidence level) still valid? Is it valid only after repair work has been carried out and has been verified?
B. Use of resulting report: This should probably take the shape of a diff report referencing the initial report resulting from (3) Conformance-oriented evaluation. What is interesting here is where something has changed: editorial content may have degraded, a previously excluded legacy part brought into fold and made accessible, etc.
C. Scope and sampling: Like (3), but possibly with a reduced sample? Random sampling may be more important here.

On 19 Jun 2013, at 17:40, Shadi Abou-Zahra wrote:

Hi Peter, all,

In short, here is what I understood in conclusion of this exchange:

- Possibly there is a 4th use-case (regression evaluation)
- Further tweaks to "confirmation evaluation" might be necessary
- Probably further lighter edits and tweaks needed throughout too

More detailed comments inline below (all for discussion, of course):

On 14.6.2013 19:05, Peter Korn wrote:

Hi Shadi,

Thank you for moving this discussion forward. More comments in-line below.

...
We discussed these three use cases. Here is an attempted write-up for these
use cases, for discussion:

- Initial Evaluation: Typically carried out when organizations first start out
with accessibility and want to learn about how well their website conforms to
WCAG 2.0 in order to improve it. It is expected that the website will likely
not conform, and the main purpose of the evaluation is to identify the types
of barriers on the website, and possibly to highlight some of the possible
repairs, so that these can be addressed in future developments.

For me, I would rather see this as a "Development Evaluation". Something
undertaken when the site owner (or web application owner) expects things aren't
fully accessible yet, and is interested in understanding the extent to which
work will need to be done. Often (or at least hopefully!) such evaluations will
be undertaken part-way through the development process, before the
site/application is generally available and while there is still significant
time left in the development process to make significant changes (e.g. to
choices of UI component sets, templates, etc.).

Report output would likely be more technical in such a circumstance I think, and
detailed lists of bugs with information on how to reproduce them will be of
significant importance.

Maybe we are talking about two somewhat different use cases here?

I mean an initial evaluation very early on in a (typically redesign) process. There would certainly be a bug list but the focus is more on educating the readers of the report on what *type* of issues there are rather than to list out the individual bugs.

What you are suggesting seems similar to what I describe as "periodic evaluation", but you are redefining that too. More further below...

- Periodic Evaluation: Typically carried out periodically to monitor how well
conformance to WCAG 2.0 was maintained, or progress towards conformance to
WCAG 2.0 was achieved during a given period. The main purpose of such
evaluations is comparability of the results between iterations. In some cases
particular areas of the website may have changed, or the entire website may
have been redesigned between one evaluation and the next, and evaluators will
need to consider these changes during the sampling and reporting stages of the
evaluation.

For me, I see this more as "Regression Evaluation". Something undertaken both
to monitor how accessibility is improving (or regressing), as well as to measure
the results of an improvement program.

Report output may be more in summary form, giving a broad measure of the level
of improvement/regression, and perhaps discussing that by area or type (e.g.
"image tagging has broadly improved, with only ~5% of images missing ALT text
vs. ~20% 6 months ago, within our tested sample of pages").

This might also be used by a development organization, (e.g. when a product goes
through various development stages: alpha, beta, etc.), though I would expect in
those cases they might simply run another "Development Evaluation", since they
will still be focused - at least from the reporting point of view - on the
detailed issues found. Middle/senior management may prefer a summary.

OK, I think this is a new use case that I have not directly considered. It is like the "periodic evaluation" with more summaries. I wonder how this impacts the evaluation process as a whole versus the reporting?

- Confirmation Evaluation: Typically carried out to confirm a claim made by
the vendor or supplier, where a website is assumed to meet particular
conformance targets in relation to WCAG 2.0. The main purpose of such
evaluation is to validate a conformance claim with reasonable confidence, or
to identify potential mismatches between the conformance claim and the
website. Such evaluations are often re-run while the vendor or supplier
addresses confirmed issues. The intervals are typically shorter than for
Periodic Evaluations and are also more focused towards the issues previously
identified.

The title "Confirmation Evaluation" suggests this is evaluation is NOT made by
the owner of the site/application, which I think is a mistake. I would hope the
same steps an owner might take to evaluate the accessibility of their
site/application is the same as what a customer/user might do (or a consumer
organization). Some may use it to confirm a vendor's claim, but others may use
it to assure themselves that their development organization did what was
expected, or a gov't agency may seek this from a contractor who did work for
them (and then do their own mini-spot-check).

OK, we can discuss the title. But I think also the website owner may want to confirm the claim made by a supplier/vendor.

Also, I am REALLY UNCOMFORTABLE with the word "conformance claim" in your
characterization Shadi. Unless every page of the entire site (and every
possible UI permutation in a web app) has been thoroughly examined, I don't see
how an entity can properly make a "conformance claim" for an entire site/ web
app. I think instead we need a new word/phrase here, and should be talking
about confidence levels around the extent to which all WCAG 2.0 SCs (at
A/AA/AAA) have been met.

Can you be more specific about which parts make you uncomfortable? I specifically tried to clarify the scope in the very first sentence:

[[
a claim made by the vendor or supplier, where a website is assumed to meet particular conformance targets in relation to WCAG 2.0
]]

I think these now reflect both the timing as well as indicate a little bit
more on the typical "depth" of an evaluation. We'll probably also need to
explain that there are many variances of these typical cases depending on the
website, context, etc. It is a spectrum, really.

Fully agree with this!

OK good.

Thanks,
Shadi

Peter

Comments and feedback welcome.

Best,
Shadi

On 6.6.2013 16:10, Detlev Fischer wrote:

Hi,

just some quick input in case you do cover my proposal to modify "Goal of the
Evaluation" today.

I get that #3 In Depth analysis report is close to what I would call "design
support test" (or "development support test") since you usually conduict it
when you *know* that the site will not conform - tyhe aim is to identify all
the issues that nieed to be addressed before a conformance evluation has a
chance to be successful.

Since it usually comes first, I find it odd that it is mentioned last, and
that no hint is given that this is usually an evaluation where the aim is
*not* a conformance evaluation (because you already know that there will be a
number of issues that fail SCs).

The on thing lacking in goal #3 is the requirement to cover all SCs acros the
sample of pages (with or without detail) and by doing so, providing a
benchmark for the degree of conformance already reached - even if it is
necessarilz a crude one.

So there are 2 things that are missing in the three types of goals we have now:

(1) a clear indication (in the name of the report type) that there is one
evaluation that does *not* aim for measuring conformance but happens in
preparation of a final test, with the aim to uneath problems;
(2) the ability in this tpe of test to provide a metric of success across all
SCs for the pages in the sample that can be compared to a later conformance
evaluation of the same site.

Sorry, I would have loved to participate today but my voice isn't up to it...

Best,
Detlev
On 5 Jun 2013, at 16:34, Velleman, Eric wrote:

Hi Detlev,

tend to look at the more detailed explanation of the three types of Reports
in Step 5.a [1]:

1. Basic Report
2. Detailed Report
3. In-Depth Analysis Report

For me the difference between #2 and #3 is in the level of detail that is
required in the Report. #2 is more on the page level, and #3 is more on the
website level:

#3 is a way of reporting that does not require you to name every failure on
every page. The evaluator is asked to give a certain amount of examples of
the occurrence of the failures on the website (not every page like in the
detailed report). This makes #2 better for statistics and research.

Does this make sense?

Eric

[1] http://www.w3.org/TR/WCAG-EM/#step5
________________________________________
Van: Detlev Fischer [detlev.fischer@testkreis.de<mailto:detlev.fischer@testkreis.de>]
Verzonden: donderdag 30 mei 2013 17:15
Aan: public-wai-evaltf@w3c.org<mailto:public-wai-evaltf@w3c.org>
Onderwerp: Step 1.b Goal of the Evaluation - Design support evaluation vs.
conformance evaluation?

Hi everyone,
as promised in the telco, here is a thought on the current section "Goal of
the Evaluation".

Currently we have:
1. Basic Report
2. Detailed Report
3. In-Depth Analysis Report

For me, 2 and 3 have always looked a bit similar as there is no clear line
between specifiying issues on pages and giving advice as to improvements
(often, you cannot not easily specify remedies in detail because as testers
we are often not familiar with the details of the development environment).

In the discussion it struck me that we seemed to have a (largely?) shared
notion that our evaluation work usually falls into one of 2 categories:

1. Design support evaluation: Take an (often unfinished) new design and find
as many issues as you can to help designers address & correct them (often in
preparation for a future conformance evaluation/ conformance claim)
2: Conformance evaluation: Check the finished site to see if it actually
meets the success criteria (this may take the form of laying the grounds for
a conformance claim, or challenging a conformance claim if a site is
evaluated independently, say, by some organisation wanting to put an
offender on the spot).

Most of our work falls into one of these two categories, and you won't be
surprised that we sell design support tests (one tester) as preparation for
final tests (in our case, two independent testers). (And I should mention
that our testing scheme currently does not follow the 100% pass-or-fail
conformance approach.)

There is actually a third use case, which is checking old sites known to
have issues *before* an organisation starts with a re-design - so they see
the scope of problems the re-design will need to address (and also be aware
that there may be areas which they *cannot* easily address and determine how
to deal with those areas).

Sorry again to raise this point somewhat belatedly. Hope this will trigger a
useful discussion.
Best,
Detlev

--
Detlev Fischer
testkreis c/o feld.wald.wiese
Thedestr. 2, 22767 Hamburg

Mobil +49 (0)1577 170 73 84
Tel +49 (0)40 439 10 68-3
Fax +49 (0)40 439 10 68-5

http://www.testkreis.de
Beratung, Tests und Schulungen für barrierefreie Websites

--
Oracle <http://www.oracle.com><http://www.oracle.com>
Peter Korn | Accessibility Principal
Phone: +1 650 5069522 <tel:+1%20650%205069522>
500 Oracle Parkway | Redwood City, CA 94064
Green Oracle <http://www.oracle.com/commitment><http://www.oracle.com/commitment> Oracle is committed to
developing practices and products that help protect the environment

--
Shadi Abou-Zahra - http://www.w3.org/People/shadi/
Activity Lead, W3C/WAI International Program Office
Evaluation and Repair Tools Working Group (ERT WG)
Research and Development Working Group (RDWG)

--
Detlev Fischer
testkreis - das Accessibility-Team von feld.wald.wiese
c/o feld.wald.wiese
Thedestraße 2
22767 Hamburg

Tel +49 (0)40 439 10 68-3
Mobil +49 (0)1577 170 73 84
Fax +49 (0)40 439 10 68-5

http://www.testkreis.de
Beratung, Tests und Schulungen für barrierefreie Websites

--
[Oracle]<http://www.oracle.com>
Peter Korn | Accessibility Principal
Phone: +1 650 5069522<tel:+1%20650%205069522>
500 Oracle Parkway | Redwood City, CA 94064
[Green Oracle]<http://www.oracle.com/commitment> Oracle is committed to developing practices and products that help protect the environment

Attachments

image/gif attachment: oracle_sig_logo.gif
image/gif attachment: green-for-email-sig_0.gif

Received on Tuesday, 16 July 2013 13:01:39 UTC