Re: uses cases for evaluation (and reporting) (was Re: Step 1.b Goal of the Evaluation - Design support evaluation vs. conformance evaluation?) from Peter Korn on 2013-06-14 (public-wai-evaltf@w3.org from June 2013)

From: Peter Korn <peter.korn@oracle.com>
Date: Fri, 14 Jun 2013 10:05:42 -0700
To: Shadi Abou-Zahra <shadi@w3.org>
CC: Eval TF <public-wai-evaltf@w3.org>, Detlev Fischer <detlev.fischer@testkreis.de>
Message-ID: <51BB4D66.6080407@oracle.com>
Hi Shadi,

Thank you for moving this discussion forward.  More comments in-line below.

> ...
> We discussed these three use cases. Here is an attempted write-up for 
> these use cases, for discussion:
>
>
> - Initial Evaluation: Typically carried out when organizations first 
> start out with accessibility and want to learn about how well their 
> website conforms to WCAG 2.0 in order to improve it. It is expected 
> that the website will likely not conform, and the main purpose of the 
> evaluation is to identify the types of barriers on the website, and 
> possibly to highlight some of the possible repairs, so that these can 
> be addressed in future developments.

For me, I would rather see this as a "Development Evaluation". Something 
undertaken when the site owner (or web application owner) expects things 
aren't fully accessible yet, and is interested in understanding the 
extent to which work will need to be done.  Often (or at least 
hopefully!) such evaluations will be undertaken part-way through the 
development process, before the site/application is generally available 
and while there is still significant time left in the development 
process to make significant changes (e.g. to choices of UI component 
sets, templates, etc.).

Report output would likely be more technical in such a circumstance I 
think, and detailed lists of bugs with information on how to reproduce 
them will be of significant importance.

>
> - Periodic Evaluation: Typically carried out periodically to monitor 
> how well conformance to WCAG 2.0 was maintained, or progress towards 
> conformance to WCAG 2.0 was achieved during a given period. The main 
> purpose of such evaluations is comparability of the results between 
> iterations. In some cases particular areas of the website may have 
> changed, or the entire website may have been redesigned between one 
> evaluation and the next, and evaluators will need to consider these 
> changes during the sampling and reporting stages of the evaluation.

For me, I see this more as "Regression Evaluation".  Something 
undertaken both to monitor how accessibility is improving (or 
regressing), as well as to measure the results of an improvement program.

Report output may be more in summary form, giving a broad measure of the 
level of improvement/regression, and perhaps discussing that by area or 
type (e.g. "image tagging has broadly improved, with only ~5% of images 
missing ALT text vs. ~20% 6 months ago, within our tested sample of 
pages").

This might also be used by a development organization, (e.g. when a 
product goes through various development stages: alpha, beta, etc.), 
though I would expect in those cases they might simply run another 
"Development Evaluation", since they will still be focused - at least 
from the reporting point of view - on the detailed issues found.  
Middle/senior management may prefer a summary.


> - Confirmation Evaluation: Typically carried out to confirm a claim 
> made by the vendor or supplier, where a website is assumed to meet 
> particular conformance targets in relation to WCAG 2.0. The main 
> purpose of such evaluation is to validate a conformance claim with 
> reasonable confidence, or to identify potential mismatches between the 
> conformance claim and the website. Such evaluations are often re-run 
> while the vendor or supplier addresses confirmed issues. The intervals 
> are typically shorter than for Periodic Evaluations and are also more 
> focused towards the issues previously identified.

The title "Confirmation Evaluation" suggests this is evaluation is NOT 
made by the owner of the site/application, which I think is a mistake.  
I would hope the same steps an owner might take to evaluate the 
accessibility of their site/application is the same as what a 
customer/user might do (or a consumer organization).  Some may use it to 
confirm a vendor's claim, but others may use it to assure themselves 
that their development organization did what was expected, or a gov't 
agency may seek this from a contractor who did work for them (and then 
do their own mini-spot-check).

Also, I am REALLY UNCOMFORTABLE with the word "conformance claim" in 
your characterization Shadi.  Unless every page of the entire site (and 
every possible UI permutation in a web app) has been thoroughly 
examined, I don't see how an entity can properly make a "conformance 
claim" for an entire site/ web app.  I think instead we need a new 
word/phrase here, and should be talking about confidence levels around 
the extent to which all WCAG 2.0 SCs (at A/AA/AAA) have been met.


> I think these now reflect both the timing as well as indicate a little 
> bit more on the typical "depth" of an evaluation. We'll probably also 
> need to explain that there are many variances of these typical cases 
> depending on the website, context, etc. It is a spectrum, really.

Fully agree with this!


Peter

>
> Comments and feedback welcome.
>
> Best,
>   Shadi
>
>
> On 6.6.2013 16:10, Detlev Fischer wrote:
>> Hi,
>>
>> just some quick input in case you do cover my proposal to modify 
>> "Goal of the Evaluation" today.
>>
>> I get that #3 In Depth analysis report is close to what I would call 
>> "design support test" (or "development support test") since you 
>> usually conduict it when you *know* that the site will not conform - 
>> tyhe aim is to identify all the issues that nieed to be addressed 
>> before a conformance evluation has a chance to be successful.
>>
>> Since it usually comes first, I find it odd that it is mentioned 
>> last, and that no hint is given that this is usually an evaluation 
>> where the aim is *not* a conformance evaluation (because you already 
>> know that there will be a number of issues that fail SCs).
>>
>> The on thing lacking in goal #3 is the requirement to cover all SCs 
>> acros the sample of pages (with or without detail) and by doing so, 
>> providing a benchmark for the degree of conformance already reached - 
>> even if it is necessarilz a crude one.
>>
>> So there are 2 things that are missing in the three types of goals we 
>> have now:
>>
>> (1) a clear indication (in the name of the report type) that there is 
>> one evaluation that does *not* aim for measuring conformance but 
>> happens in preparation of a final test, with the aim to uneath problems;
>> (2) the ability in this tpe of test to provide a metric of success 
>> across all SCs for the pages in the sample that can be compared to a 
>> later conformance evaluation of the same site.
>>
>> Sorry, I would have loved to participate today but my voice isn't up 
>> to it...
>>
>> Best,
>> Detlev
>> On 5 Jun 2013, at 16:34, Velleman, Eric wrote:
>>
>>> Hi Detlev,
>>>
>>> tend to look at the more detailed explanation of the three types of 
>>> Reports in Step 5.a [1]:
>>>
>>> 1. Basic Report
>>> 2. Detailed Report
>>> 3. In-Depth Analysis Report
>>>
>>> For me the difference between #2 and #3 is in the level of detail 
>>> that is required in the Report. #2 is more on the page level, and #3 
>>> is more on the website level:
>>>
>>> #3 is a way of reporting that does not require you to name every 
>>> failure on every page. The evaluator is asked to give a certain 
>>> amount of examples of the occurrence of the failures on the website 
>>> (not every page like in the detailed report). This makes #2 better 
>>> for statistics and research.
>>>
>>> Does this make sense?
>>>
>>> Eric
>>>
>>>
>>> [1] http://www.w3.org/TR/WCAG-EM/#step5
>>> ________________________________________
>>> Van: Detlev Fischer [detlev.fischer@testkreis.de]
>>> Verzonden: donderdag 30 mei 2013 17:15
>>> Aan: public-wai-evaltf@w3c.org
>>> Onderwerp: Step 1.b Goal of the Evaluation - Design support 
>>> evaluation vs. conformance evaluation?
>>>
>>> Hi everyone,
>>> as promised in the telco, here is a thought on the current section 
>>> "Goal of the Evaluation".
>>>
>>> Currently we have:
>>> 1. Basic Report
>>> 2. Detailed Report
>>> 3. In-Depth Analysis Report
>>>
>>> For me, 2 and 3 have always looked a bit similar as there is no 
>>> clear line between specifiying issues on pages and giving advice as 
>>> to improvements (often, you cannot not easily specify remedies in 
>>> detail because as testers we are often not familiar with the details 
>>> of the development environment).
>>>
>>> In the discussion it struck me that we seemed to have a (largely?) 
>>> shared notion that our evaluation work usually falls into one of 2 
>>> categories:
>>>
>>> 1. Design support evaluation: Take an (often unfinished) new design 
>>> and find as many issues as you can to help designers address & 
>>> correct them (often in preparation for a future conformance 
>>> evaluation/ conformance claim)
>>> 2: Conformance evaluation: Check the finished site to see if it 
>>> actually meets the success criteria (this may take the form of 
>>> laying the grounds for a conformance claim, or challenging a 
>>> conformance claim if a site is evaluated independently, say, by some 
>>> organisation wanting to put an offender on the spot).
>>>
>>> Most of our work falls into one of these two categories, and you 
>>> won't be surprised that we sell design support tests (one tester) as 
>>> preparation for final tests (in our case, two independent testers). 
>>> (And I should mention that our testing scheme currently does not 
>>> follow the 100% pass-or-fail conformance approach.)
>>>
>>> There is actually a third use case, which is checking old sites 
>>> known to have issues *before* an organisation starts with a 
>>> re-design - so they see the scope of problems the re-design will 
>>> need to address (and also be aware that there may be areas which 
>>> they *cannot* easily address and determine how to deal with those 
>>> areas).
>>>
>>> Sorry again to raise this point somewhat belatedly. Hope this will 
>>> trigger a useful discussion.
>>> Best,
>>> Detlev
>>>
>>>
>>> -- 
>>> Detlev Fischer
>>> testkreis c/o feld.wald.wiese
>>> Thedestr. 2, 22767 Hamburg
>>>
>>> Mobil +49 (0)1577 170 73 84
>>> Tel +49 (0)40 439 10 68-3
>>> Fax +49 (0)40 439 10 68-5
>>>
>>> http://www.testkreis.de
>>> Beratung, Tests und Schulungen für barrierefreie Websites
>>>
>>>
>>>
>>>
>>>
>>
>

-- 
Oracle <http://www.oracle.com>
Peter Korn | Accessibility Principal
Phone: +1 650 5069522 <tel:+1%20650%205069522>
500 Oracle Parkway | Redwood City, CA 94064
Green Oracle <http://www.oracle.com/commitment> Oracle is committed to 
developing practices and products that help protect the environment
Received on Friday, 14 June 2013 17:06:19 UTC