Re: metadata for evaluation methodology? from Detlev Fischer on 2012-01-12 (public-wai-evaltf@w3.org from January 2012)

From: Detlev Fischer <fischer@dias.de>
Date: Thu, 12 Jan 2012 10:25:36 +0100
To: public-wai-evaltf@w3.org
Message-ID: <4F0EA710.2050008@dias.de>
Hi Tim,

I think this is an excellent idea and it helps us structuring our effort.

I have quickly gone through the list of Test Metadata you have suplied 
and tried to map it on what metadata we might want to record. I think it 
is already quite a good fit.

Here are my thoughts:

----
2.1 Identifier
----
An identifier could be developed that might have a generic first part 
with, WCAG, version of techniques doc, unique ID of evaluating 
organization (such thing does not exist yet but is a possibility) may be 
more, and an evaluator-specified second part including a unique 
identifier of the test by that evaluating organization. Such hybrid 
Identifiers have worked quite well in the ATA numbering system in rthwe 
airline industry – but better schemes may exist…

----
2.2 Title
----
Name of the site, may be identifying the scope if several scopes are 
tested differently

----
2.3 Purpose
----
Seems more an aspect of the methodology itself than of any individual 
test/evaluation.

----
2.4 Description
----
This may contain a description of the context of the evaluation, 
especially regarding sections excluded (e.g., “web 2.0” sections with 
content submitted by users where full conformance to SC 1.3.1 may not be 
safeguarded systemically), or legacy content excluded, plans (if 
applicable) making such legacy content accessible in the future.

----
2.5 Status
----
This seems to be what I have described in section 2.10 Version further 
below – or it could be the current (latest) state of the series of 
states of one test.

----
2.6 SpecRef
----
Seems to map nicely onto the WCAG conformance level to be set per scope 
tested.

----
2.7 Preconditions
----
A precondition may rule to exclude sites of parts of sites from 
evaluation on the grounds of lacking accessibility support for the 
context of use. Example: WAI-ARIA and use of modern UA/AT can be 
safeguarded in an intranet context but such support is debatable if we 
think of a public site.
Another precondition may be a requirement for freezing the development 
of the site until the termination of the test (or testing a copy while 
development continues)

----
2.8 Inputs
----
Not sure what this might contain. (Provisional) URL of site/pages at the 
time of testing? A rationale of site owner for excluding parts of the 
site from scope?

----
2.9 ExpectedResults
----
For a binary version of the test, the expected result is that all SC on 
the chosen level of conformance are met. SC-based “tolerance metrics” 
such as “percentage of incidental errors permitted” would be documented. 
We need to determine whether the baseline for such metrics is the some 
average over the sample or each individual page tested (more likely).
A graded version may sit behind the conformance oriented binary result 
and indicate the level of conformance, e.g., in percent. This may simply 
be informal (or informational?) but could prove very useful. I 
personally do not believe a graded result can be based on a mere 
quantitative summing up fail and pass incidents per SC and page. At 
least it would need to recognize critical fails (e.g. image-based 
control without alt) overruling the quantitative count.

----
2.10 Version
----
This seems very useful. If the test is carried out in a DB-based 
application, it is quite easy to save the different states. Just to give 
an example, in the case of our test (BITV-Test) we now record

* Initial evaluator-selected page sample
* page sample signed off after quality check by the selected
   quality assurer who may ask evaluator for changes or additions
   before signing it off
* individual test instances after completion by evaluator
   (one or two)
* consensual test result after the arbitration phase involving
   both evaluators (only in tandem tests)
* final result after check by quality assurer and any changes
   requested before signing off the final test

A re-test would not qualify as version, it would be a new test. 2.13 
Grouping might link different tests of the same site carried out at 
different points in time

----
2.11 Contributor
----
For tests, this might have the components “site owner” (who is not 
necessarily the commissioner/customer of the test), “agency/agencies” 
(it is common to have several agencies contribute to a site design), and 
“evaluator/organisation” (evaluators may cooperate that may belong to 
more than one organisation).
If it applies to the evaluation results itself, separating out 
contributors  can get tricky when having several evaluators and a 
quality assurer working on a test and possibly modifying results 
(ratings and comments). But this may not be a real problem as long as 
there is consensual agreement on changes before they are finalised.

----
2.12 Rights
----
I doubt we will see copyright issues with evaluator reports but if used 
I guess the evaluator would have the right, several evaluators would 
have common rights (if such construct exists)?

----
2.13 Grouping
----
Could be useful for differentiating segments of a site if they conform 
to different levels of WCAG. Another possible use is to group pages in 
the sample that apply the full set of SC and (if this is the approach 
taken) those that apply a partial set (e.g. only SC relevant for data 
tables. Which would be applicable and which others can be safely ignored 
needs to be worked out. A subset for tables would apart from 1.3.1. 
certainly need to include contrast, reading order when linearized, 
scalability etc, but certainly not error identification, consistent 
navigation, multiple ways, bypass blocks, etc.

----
2.14 SeeAlso
----
Might be used for references to earlier tests or pre-tests of the same site


Am 11.01.2012 19:58, schrieb Boland Jr, Frederick E.:
> For example, date of evaluation, who is evaluating, what they’re
> evaluating (scope of evaluation, number of pages, etc.)), sampling
> method used, scoring method/error analysis used, version of WCAG2.0 used
> in evaluation, additional resources/data pertinent to evaluation, etc.)
> – that kind of thing – a little bit like:
>
> http://www.w3.org/TR/test-metadata/
>
> for tests, but in our case for evaluations. Some of the stuff in the
> methodology document, but in metadata form – coming to agreement on what
> should be in a good evaluation report besides the obvious (conformance
> to WCAG2.0) –
>
> sorry I wasn’t clear..
>
> Best, Tim
>
> *From:*Michael S Elledge [mailto:elledge@msu.edu]
> *Sent:* Wednesday, January 11, 2012 1:03 PM
> *To:* Boland Jr, Frederick E.
> *Cc:* Eval TF
> *Subject:* Re: metadata for evaluation methodology?
>
> Hi Tim--
>
> Can you expand a bit on this? I don't think I understand. It seems to me
> that the required components of an evaluation would be the WCAG 2.0
> success criteria, but I'll bet that isn't what you meant.
>
> Mike
>
> On 1/11/2012 8:38 AM, Boland Jr, Frederick E. wrote:
>
> I’m worried that we don’t seem to be compiling anywhere a list of
> (required/optional) (terms/components) for an evaluation. We’re having a
> lot of good discussion and document development, but I couldn’t find
> captured from that discussion anywhere some preliminary metadata for
> evaluations, like was done for Test Metadata for old W3C Quality
> Assurance Working Group and for old test samples development task force.
> Maybe it’s too early yet, but at least we should be brainstorming about
> such a list, at least for the template appendix?
>
> Just a thought..
>
> Thanks and best wishes
>
> Tim Boland NIST
>
> PS – if such a list exists, then my apologies..
>
Received on Thursday, 12 January 2012 09:26:16 UTC