RE: uses cases for evaluation (and reporting) (was Re: Step 1.b Goal of the Evaluation - Design support evaluation vs. conformance evaluation?) from Emmanuelle Gutiérrez y Restrepo on 2013-06-14 (public-wai-evaltf@w3.org from June 2013)

From: Emmanuelle Gutiérrez y Restrepo <emmanuelle@sidar.org>
Date: Fri, 14 Jun 2013 20:41:37 +0200
To: "'Peter Korn'" <peter.korn@oracle.com>, "'Shadi Abou-Zahra'" <shadi@w3.org>
Cc: "'Eval TF'" <public-wai-evaltf@w3.org>, "'Detlev Fischer'" <detlev.fischer@testkreis.de>
Message-ID: <032401ce692e$ce69b7d0$6b3d2770$@sidar.org>
Hi all,

 

I completely agree with Peter about the first use case. It should be
“Development Evaluation”.

 

But in changing "periodic evaluation" by "regression evaluation", I think
that although in most cases the result may be a regression, called that an
evaluation would give a negative connotation. And since Peter himself has
pointed the possibility of an improvement, I think the term "periodic
evaluation" is more neutral. Although maybe we could rename as "maintenance
assessment" or something like that.

 

All the best,

Emmanuelle

 

De: Peter Korn [mailto:peter.korn@oracle.com] 
Enviado el: viernes, 14 de junio de 2013 19:06
Para: Shadi Abou-Zahra
CC: Eval TF; Detlev Fischer
Asunto: Re: uses cases for evaluation (and reporting) (was Re: Step 1.b Goal
of the Evaluation - Design support evaluation vs. conformance evaluation?)

 

Hi Shadi,

Thank you for moving this discussion forward.  More comments in-line below.




...
We discussed these three use cases. Here is an attempted write-up for these
use cases, for discussion: 


- Initial Evaluation: Typically carried out when organizations first start
out with accessibility and want to learn about how well their website
conforms to WCAG 2.0 in order to improve it. It is expected that the website
will likely not conform, and the main purpose of the evaluation is to
identify the types of barriers on the website, and possibly to highlight
some of the possible repairs, so that these can be addressed in future
developments. 


For me, I would rather see this as a "Development Evaluation".  Something
undertaken when the site owner (or web application owner) expects things
aren't fully accessible yet, and is interested in understanding the extent
to which work will need to be done.  Often (or at least hopefully!) such
evaluations will be undertaken part-way through the development process,
before the site/application is generally available and while there is still
significant time left in the development process to make significant changes
(e.g. to choices of UI component sets, templates, etc.).

Report output would likely be more technical in such a circumstance I think,
and detailed lists of bugs with information on how to reproduce them will be
of significant importance.





- Periodic Evaluation: Typically carried out periodically to monitor how
well conformance to WCAG 2.0 was maintained, or progress towards conformance
to WCAG 2.0 was achieved during a given period. The main purpose of such
evaluations is comparability of the results between iterations. In some
cases particular areas of the website may have changed, or the entire
website may have been redesigned between one evaluation and the next, and
evaluators will need to consider these changes during the sampling and
reporting stages of the evaluation. 


For me, I see this more as "Regression Evaluation".  Something undertaken
both to monitor how accessibility is improving (or regressing), as well as
to measure the results of an improvement program.  

Report output may be more in summary form, giving a broad measure of the
level of improvement/regression, and perhaps discussing that by area or type
(e.g. "image tagging has broadly improved, with only ~5% of images missing
ALT text vs. ~20% 6 months ago, within our tested sample of pages").  

This might also be used by a development organization, (e.g. when a product
goes through various development stages: alpha, beta, etc.), though I would
expect in those cases they might simply run another "Development
Evaluation", since they will still be focused - at least from the reporting
point of view - on the detailed issues found.  Middle/senior management may
prefer a summary.





- Confirmation Evaluation: Typically carried out to confirm a claim made by
the vendor or supplier, where a website is assumed to meet particular
conformance targets in relation to WCAG 2.0. The main purpose of such
evaluation is to validate a conformance claim with reasonable confidence, or
to identify potential mismatches between the conformance claim and the
website. Such evaluations are often re-run while the vendor or supplier
addresses confirmed issues. The intervals are typically shorter than for
Periodic Evaluations and are also more focused towards the issues previously
identified. 


The title "Confirmation Evaluation" suggests this is evaluation is NOT made
by the owner of the site/application, which I think is a mistake.  I would
hope the same steps an owner might take to evaluate the accessibility of
their site/application is the same as what a customer/user might do (or a
consumer organization).  Some may use it to confirm a vendor's claim, but
others may use it to assure themselves that their development organization
did what was expected, or a gov't agency may seek this from a contractor who
did work for them (and then do their own mini-spot-check).

Also, I am REALLY UNCOMFORTABLE with the word "conformance claim" in your
characterization Shadi.  Unless every page of the entire site (and every
possible UI permutation in a web app) has been thoroughly examined, I don't
see how an entity can properly make a "conformance claim" for an entire
site/ web app.  I think instead we need a new word/phrase here, and should
be talking about confidence levels around the extent to which all WCAG 2.0
SCs (at A/AA/AAA) have been met.  





I think these now reflect both the timing as well as indicate a little bit
more on the typical "depth" of an evaluation. We'll probably also need to
explain that there are many variances of these typical cases depending on
the website, context, etc. It is a spectrum, really. 


Fully agree with this!


Peter





Comments and feedback welcome. 

Best, 
  Shadi 


On 6.6.2013 16:10, Detlev Fischer wrote: 



Hi, 

just some quick input in case you do cover my proposal to modify "Goal of
the Evaluation" today. 

I get that #3 In Depth analysis report is close to what I would call "design
support test" (or "development support test")  since you usually conduict it
when you *know* that the site will not conform - tyhe aim is to identify all
the issues that nieed to be addressed before a conformance evluation has a
chance to be successful. 

Since it usually comes first, I find it odd that it is mentioned last, and
that no hint is given that this is usually an evaluation where the aim is
*not* a conformance evaluation (because you already know that there will be
a number of issues that fail SCs). 

The on thing lacking in goal #3 is the requirement to cover all SCs acros
the sample of pages (with or without detail) and by doing so, providing a
benchmark for the degree of conformance already reached - even if it is
necessarilz a crude one. 

So there are 2 things that are missing in the three types of goals we have
now: 

(1) a clear indication (in the name of the report type) that there is one
evaluation that does *not* aim for measuring conformance but happens in
preparation of a final test, with the aim to uneath problems; 
(2) the ability in this tpe of test to provide a metric of success across
all SCs for the pages in the sample that can be compared to a later
conformance evaluation of the same site. 

Sorry, I would have loved to participate today but my voice isn't up to
it... 

Best, 
Detlev 
On 5 Jun 2013, at 16:34, Velleman, Eric wrote: 




Hi Detlev, 

tend to look at the more detailed explanation of the three types of Reports
in Step 5.a [1]: 

1. Basic Report 
2. Detailed Report 
3. In-Depth Analysis Report 

For me the difference between #2 and #3 is in the level of detail that is
required in the Report. #2 is more on the page level, and #3 is more on the
website level: 

#3 is a way of reporting that does not require you to name every failure on
every page. The evaluator is asked to give a certain amount of examples of
the occurrence of the failures on the website (not every page like in the
detailed report). This makes #2 better for statistics and research. 

Does this make sense? 

Eric 


[1] http://www.w3.org/TR/WCAG-EM/#step5 
________________________________________ 
Van: Detlev Fischer [detlev.fischer@testkreis.de] 
Verzonden: donderdag 30 mei 2013 17:15 
Aan: public-wai-evaltf@w3c.org 
Onderwerp: Step 1.b Goal of the Evaluation - Design support evaluation vs.
conformance evaluation? 

Hi everyone, 
as promised in the telco, here is a thought on the current section "Goal of
the Evaluation". 

Currently we have: 
1. Basic Report 
2. Detailed Report 
3. In-Depth Analysis Report 

For me, 2 and 3 have always looked a bit similar as there is no clear line
between specifiying issues on pages and giving advice as to improvements
(often, you cannot not easily specify remedies in detail because as testers
we are often not familiar with the details of the development environment). 

In the discussion it struck me that we seemed to have a (largely?) shared
notion that our evaluation work usually falls into one of 2 categories: 

1. Design support evaluation: Take an (often unfinished) new design and find
as many issues as you can to help designers address & correct them (often in
preparation for a future conformance evaluation/ conformance claim) 
2: Conformance evaluation: Check the finished site to see if it actually
meets the success criteria (this may take the form of laying the grounds for
a conformance claim, or challenging a conformance claim if a site is
evaluated independently, say, by some organisation wanting to put an
offender on the spot). 

Most of our work falls into one of these two categories, and you won't be
surprised that we sell design support tests (one tester) as preparation for
final tests (in our case, two independent testers). (And I should mention
that our testing scheme currently does not follow the 100% pass-or-fail
conformance approach.) 

There is actually a third use case, which is checking old sites known to
have issues *before* an organisation starts with a re-design - so they see
the scope of problems the re-design will need to address (and also be aware
that there may be areas which they *cannot* easily address and determine how
to deal with those areas). 

Sorry again to raise this point somewhat belatedly. Hope this will trigger a
useful discussion. 
Best, 
Detlev 


-- 
Detlev Fischer 
testkreis c/o feld.wald.wiese 
Thedestr. 2, 22767 Hamburg 

Mobil +49 (0)1577 170 73 84 
Tel +49 (0)40 439 10 68-3 
Fax +49 (0)40 439 10 68-5 

http://www.testkreis.de 
Beratung, Tests und Schulungen für barrierefreie Websites 






 

 

 

-- 
 <http://www.oracle.com> Oracle
Peter Korn | Accessibility Principal
Phone: +1 650 5069522 <tel:+1%20650%205069522>  
500 Oracle Parkway | Redwood City, CA 94064 
 <http://www.oracle.com/commitment> Green
OracleOracle is committed to developing practices and products that help
protect the environment
Attachments

image/gif attachment: image001.gif
image/gif attachment: image002.gif
Received on Friday, 14 June 2013 18:42:09 UTC