Re: uses cases for evaluation (and reporting) (was Re: Step 1.b Goal of the Evaluation - Design support evaluation vs. conformance evaluation?) from Shadi Abou-Zahra on 2013-06-19 (public-wai-evaltf@w3.org from June 2013)

From: Shadi Abou-Zahra <shadi@w3.org>
Date: Wed, 19 Jun 2013 17:40:31 +0200
To: Peter Korn <peter.korn@oracle.com>
CC: Eval TF <public-wai-evaltf@w3.org>, Detlev Fischer <detlev.fischer@testkreis.de>
Message-ID: <51C1D0EF.80904@w3.org>
Hi Peter, all,

In short, here is what I understood in conclusion of this exchange:

- Possibly there is a 4th use-case (regression evaluation)
- Further tweaks to "confirmation evaluation" might be necessary
- Probably further lighter edits and tweaks needed throughout too


More detailed comments inline below (all for discussion, of course):


On 14.6.2013 19:05, Peter Korn wrote:
> Hi Shadi,
>
> Thank you for moving this discussion forward.  More comments in-line below.
>
>> ...
>> We discussed these three use cases. Here is an attempted write-up for these
>> use cases, for discussion:
>>
>>
>> - Initial Evaluation: Typically carried out when organizations first start out
>> with accessibility and want to learn about how well their website conforms to
>> WCAG 2.0 in order to improve it. It is expected that the website will likely
>> not conform, and the main purpose of the evaluation is to identify the types
>> of barriers on the website, and possibly to highlight some of the possible
>> repairs, so that these can be addressed in future developments.
>
> For me, I would rather see this as a "Development Evaluation". Something
> undertaken when the site owner (or web application owner) expects things aren't
> fully accessible yet, and is interested in understanding the extent to which
> work will need to be done.  Often (or at least hopefully!) such evaluations will
> be undertaken part-way through the development process, before the
> site/application is generally available and while there is still significant
> time left in the development process to make significant changes (e.g. to
> choices of UI component sets, templates, etc.).
>
> Report output would likely be more technical in such a circumstance I think, and
> detailed lists of bugs with information on how to reproduce them will be of
> significant importance.

Maybe we are talking about two somewhat different use cases here?

I mean an initial evaluation very early on in a (typically redesign) 
process. There would certainly be a bug list but the focus is more on 
educating the readers of the report on what *type* of issues there are 
rather than to list out the individual bugs.

What you are suggesting seems similar to what I describe as "periodic 
evaluation", but you are redefining that too. More further below...


>> - Periodic Evaluation: Typically carried out periodically to monitor how well
>> conformance to WCAG 2.0 was maintained, or progress towards conformance to
>> WCAG 2.0 was achieved during a given period. The main purpose of such
>> evaluations is comparability of the results between iterations. In some cases
>> particular areas of the website may have changed, or the entire website may
>> have been redesigned between one evaluation and the next, and evaluators will
>> need to consider these changes during the sampling and reporting stages of the
>> evaluation.
>
> For me, I see this more as "Regression Evaluation".  Something undertaken both
> to monitor how accessibility is improving (or regressing), as well as to measure
> the results of an improvement program.
>
> Report output may be more in summary form, giving a broad measure of the level
> of improvement/regression, and perhaps discussing that by area or type (e.g.
> "image tagging has broadly improved, with only ~5% of images missing ALT text
> vs. ~20% 6 months ago, within our tested sample of pages").
>
> This might also be used by a development organization, (e.g. when a product goes
> through various development stages: alpha, beta, etc.), though I would expect in
> those cases they might simply run another "Development Evaluation", since they
> will still be focused - at least from the reporting point of view - on the
> detailed issues found.  Middle/senior management may prefer a summary.

OK, I think this is a new use case that I have not directly considered. 
It is like the "periodic evaluation" with more summaries. I wonder how 
this impacts the evaluation process as a whole versus the reporting?


>> - Confirmation Evaluation: Typically carried out to confirm a claim made by
>> the vendor or supplier, where a website is assumed to meet particular
>> conformance targets in relation to WCAG 2.0. The main purpose of such
>> evaluation is to validate a conformance claim with reasonable confidence, or
>> to identify potential mismatches between the conformance claim and the
>> website. Such evaluations are often re-run while the vendor or supplier
>> addresses confirmed issues. The intervals are typically shorter than for
>> Periodic Evaluations and are also more focused towards the issues previously
>> identified.
>
> The title "Confirmation Evaluation" suggests this is evaluation is NOT made by
> the owner of the site/application, which I think is a mistake.  I would hope the
> same steps an owner might take to evaluate the accessibility of their
> site/application is the same as what a customer/user might do (or a consumer
> organization).  Some may use it to confirm a vendor's claim, but others may use
> it to assure themselves that their development organization did what was
> expected, or a gov't agency may seek this from a contractor who did work for
> them (and then do their own mini-spot-check).

OK, we can discuss the title. But I think also the website owner may 
want to confirm the claim made by a supplier/vendor.


> Also, I am REALLY UNCOMFORTABLE with the word "conformance claim" in your
> characterization Shadi.  Unless every page of the entire site (and every
> possible UI permutation in a web app) has been thoroughly examined, I don't see
> how an entity can properly make a "conformance claim" for an entire site/ web
> app.  I think instead we need a new word/phrase here, and should be talking
> about confidence levels around the extent to which all WCAG 2.0 SCs (at
> A/AA/AAA) have been met.

Can you be more specific about which parts make you uncomfortable? I 
specifically tried to clarify the scope in the very first sentence:

[[
a claim made by the vendor or supplier, where a website is assumed to 
meet particular conformance targets in relation to WCAG 2.0
]]


>> I think these now reflect both the timing as well as indicate a little bit
>> more on the typical "depth" of an evaluation. We'll probably also need to
>> explain that there are many variances of these typical cases depending on the
>> website, context, etc. It is a spectrum, really.
>
> Fully agree with this!

OK good.


Thanks,
   Shadi


> Peter
>
>>
>> Comments and feedback welcome.
>>
>> Best,
>>   Shadi
>>
>>
>> On 6.6.2013 16:10, Detlev Fischer wrote:
>>> Hi,
>>>
>>> just some quick input in case you do cover my proposal to modify "Goal of the
>>> Evaluation" today.
>>>
>>> I get that #3 In Depth analysis report is close to what I would call "design
>>> support test" (or "development support test") since you usually conduict it
>>> when you *know* that the site will not conform - tyhe aim is to identify all
>>> the issues that nieed to be addressed before a conformance evluation has a
>>> chance to be successful.
>>>
>>> Since it usually comes first, I find it odd that it is mentioned last, and
>>> that no hint is given that this is usually an evaluation where the aim is
>>> *not* a conformance evaluation (because you already know that there will be a
>>> number of issues that fail SCs).
>>>
>>> The on thing lacking in goal #3 is the requirement to cover all SCs acros the
>>> sample of pages (with or without detail) and by doing so, providing a
>>> benchmark for the degree of conformance already reached - even if it is
>>> necessarilz a crude one.
>>>
>>> So there are 2 things that are missing in the three types of goals we have now:
>>>
>>> (1) a clear indication (in the name of the report type) that there is one
>>> evaluation that does *not* aim for measuring conformance but happens in
>>> preparation of a final test, with the aim to uneath problems;
>>> (2) the ability in this tpe of test to provide a metric of success across all
>>> SCs for the pages in the sample that can be compared to a later conformance
>>> evaluation of the same site.
>>>
>>> Sorry, I would have loved to participate today but my voice isn't up to it...
>>>
>>> Best,
>>> Detlev
>>> On 5 Jun 2013, at 16:34, Velleman, Eric wrote:
>>>
>>>> Hi Detlev,
>>>>
>>>> tend to look at the more detailed explanation of the three types of Reports
>>>> in Step 5.a [1]:
>>>>
>>>> 1. Basic Report
>>>> 2. Detailed Report
>>>> 3. In-Depth Analysis Report
>>>>
>>>> For me the difference between #2 and #3 is in the level of detail that is
>>>> required in the Report. #2 is more on the page level, and #3 is more on the
>>>> website level:
>>>>
>>>> #3 is a way of reporting that does not require you to name every failure on
>>>> every page. The evaluator is asked to give a certain amount of examples of
>>>> the occurrence of the failures on the website (not every page like in the
>>>> detailed report). This makes #2 better for statistics and research.
>>>>
>>>> Does this make sense?
>>>>
>>>> Eric
>>>>
>>>>
>>>> [1] http://www.w3.org/TR/WCAG-EM/#step5
>>>> ________________________________________
>>>> Van: Detlev Fischer [detlev.fischer@testkreis.de]
>>>> Verzonden: donderdag 30 mei 2013 17:15
>>>> Aan: public-wai-evaltf@w3c.org
>>>> Onderwerp: Step 1.b Goal of the Evaluation - Design support evaluation vs.
>>>> conformance evaluation?
>>>>
>>>> Hi everyone,
>>>> as promised in the telco, here is a thought on the current section "Goal of
>>>> the Evaluation".
>>>>
>>>> Currently we have:
>>>> 1. Basic Report
>>>> 2. Detailed Report
>>>> 3. In-Depth Analysis Report
>>>>
>>>> For me, 2 and 3 have always looked a bit similar as there is no clear line
>>>> between specifiying issues on pages and giving advice as to improvements
>>>> (often, you cannot not easily specify remedies in detail because as testers
>>>> we are often not familiar with the details of the development environment).
>>>>
>>>> In the discussion it struck me that we seemed to have a (largely?) shared
>>>> notion that our evaluation work usually falls into one of 2 categories:
>>>>
>>>> 1. Design support evaluation: Take an (often unfinished) new design and find
>>>> as many issues as you can to help designers address & correct them (often in
>>>> preparation for a future conformance evaluation/ conformance claim)
>>>> 2: Conformance evaluation: Check the finished site to see if it actually
>>>> meets the success criteria (this may take the form of laying the grounds for
>>>> a conformance claim, or challenging a conformance claim if a site is
>>>> evaluated independently, say, by some organisation wanting to put an
>>>> offender on the spot).
>>>>
>>>> Most of our work falls into one of these two categories, and you won't be
>>>> surprised that we sell design support tests (one tester) as preparation for
>>>> final tests (in our case, two independent testers). (And I should mention
>>>> that our testing scheme currently does not follow the 100% pass-or-fail
>>>> conformance approach.)
>>>>
>>>> There is actually a third use case, which is checking old sites known to
>>>> have issues *before* an organisation starts with a re-design - so they see
>>>> the scope of problems the re-design will need to address (and also be aware
>>>> that there may be areas which they *cannot* easily address and determine how
>>>> to deal with those areas).
>>>>
>>>> Sorry again to raise this point somewhat belatedly. Hope this will trigger a
>>>> useful discussion.
>>>> Best,
>>>> Detlev
>>>>
>>>>
>>>> --
>>>> Detlev Fischer
>>>> testkreis c/o feld.wald.wiese
>>>> Thedestr. 2, 22767 Hamburg
>>>>
>>>> Mobil +49 (0)1577 170 73 84
>>>> Tel +49 (0)40 439 10 68-3
>>>> Fax +49 (0)40 439 10 68-5
>>>>
>>>> http://www.testkreis.de
>>>> Beratung, Tests und Schulungen für barrierefreie Websites
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>
>
> --
> Oracle <http://www.oracle.com>
> Peter Korn | Accessibility Principal
> Phone: +1 650 5069522 <tel:+1%20650%205069522>
> 500 Oracle Parkway | Redwood City, CA 94064
> Green Oracle <http://www.oracle.com/commitment> Oracle is committed to
> developing practices and products that help protect the environment
>

-- 
Shadi Abou-Zahra - http://www.w3.org/People/shadi/
Activity Lead, W3C/WAI International Program Office
Evaluation and Repair Tools Working Group (ERT WG)
Research and Development Working Group (RDWG)
Received on Wednesday, 19 June 2013 15:41:02 UTC