Re: Requirements draft from Detlev Fischer on 2011-09-12 (public-wai-evaltf@w3.org from September 2011)

From: Detlev Fischer <fischer@dias.de>
Date: Mon, 12 Sep 2011 10:20:36 +0200
To: public-wai-evaltf@w3.org
Message-ID: <4E6DC0D4.8060408@dias.de>
Am 12.09.2011 01:11, schrieb RichardWarren:
> Hi,
> Following on from discussing Eric’s target audience perhaps we should
> start on his suggested Requirements. I attach my comments below for
> starters.
>
> * Requirements:
> R01: Technical conformance to existing Web Accessibility Initiative
> (WAI) Recommendations and Techniques documents.
> Comment (RW) : I do not think we need the word technical. We should
> stick with WCAG as agreed when we discussed *A01. The recommendations
> and techniques are not relevant here as our priority is the Guidelines.
> It is possible for someone to comply with a particular guideline without
> using any of the recommended techniques. What we are after is
> methodology. I therefore suggest a suitable alternative as follows:
>
> *R01 Define methods for evaluating compliance with the accessibility
> guidelines (WCAG)

DF: Richards take on R01 is very general but maybe needs to be at this 
stage. I wonder if further reference points will be needed beyond WCAG. 
Shall we spell out WCAG fully?
>
>
> R02: Tool and browser independent
> Comment (RW) : The principle is good but sometimes it may be necessary
> to use a particular tool such as a text-only browser. So I would prefer :
>
> *R02: Where possible the evaluation process should be tool and browser
> independent.

DF: The practical value of the methodology might be that it provides 
step-by-step descriptions of *what to do* in testing *with what tools*, 
refering to free and commonly available browsers, tool bars, addons like 
Firebug and software like Color Contrast Analyzer or aChecker.

If the method is very generic, it may end up being little more than the 
text of WCAG Success Criteria in another wrapping. Well, I realise that 
referring to particular tools may not be workable given the likely wish 
of many to continue to use the tools and routines they have used over a 
long period. In any case, the less well defined the actual process of 
testing, the slimmer the chance that any two evaluators will arrive at 
the same result for any particular page and SC.

A mapping of different accepted processes to test for a particular SC 
may be an option but I guess that would be time-consuming and be out of 
date quickly. Could be interesting for people trying to build their own 
practical approach on top of a generic method, however.
>
>
> R03: Unique interpretation
> Comment (RW) : I think this means that it should be unambiguous, that
> means it is not open to different interpretations. I am pretty sure that
> the W3C has a standard clause it uses to cover this point when building
> standards etc. Hopefully Shadi can find it <Grin> . This also implies
> use of standard terminology which we should be looking at as soon as
> possible so that terms like “atomic testing” do not creep into our
> procedures without clear /agreed definitions.

DF: I have spent some time arguing that the testing of many SC is not a 
black & white thing (1.3.1 headings, 1.1.1 alt text, etc), especially if 
we aggregate results for all "atomic" (sorry) instances on a page level 
and use the page as unit to be evaluated. I have not seen much reaction 
to that by others so far.
I would drop R03 as unrealistic.
>
>
> R04: Replicability: different Web accessibility evaluators who perform
> the same tests on the same site should get the same results within a
> given tolerance.
> Comment (RW) : The first part is good, but I am not happy with
> introducing “tolerance” at this stage. I think we should be clear that
> we are after consistent, replicable tests. I think we should add
> separate requirement later for such things as “partial compliance” and
> “tolerance. See R14 below.
>
> *R04: Replicability: different Web accessibility evaluators who perform
> the same tests on the same site should get the same results.

DF: I think I know this will never happen UNLESS people use the same 
closely defined step-by-step process AND have a common / shared 
understanding as to what constitutes a failure or success across a range 
of different implementations. Even then, exact replicability will be the 
exception.
If the method we aim for should be generic and there is no element of 
arbitraiton between testers and no validation by a (virtual) community, 
no chance of replicability, im my opinion.
I would drop R04 as unrealistic.

>
> R05: Translatable
> Comment (RW) : As in translatable into different languages – Yes - agree
>
>
> R06: The methodology points to the existing tests in the techniques
> documents and does not reproduce them.
> Comment (RW) : yes – but I would like it a bit clearer that it is WCAG
> techniques. I would also like the option to introduce a new technique if
> it becomes available. So I suggest
>
> *R06 Where possible the methodology should point to existing tests and
> techniques in the WCAG documentation.

DF: Referring to the tests in WCAG techniques may have little practical 
value as there will be a lot of redundany across test descriptions that 
would be unnecessary in an testing procedure aggregated on the level of 
SC (or aspect of SC, e.g. in SC 1.3.1). Tests in WCAG Failures may be a 
better starting point since Failures have no disclaimer that failing the 
test in a technique may mean nothing since another technique might have 
been used.

Not sure how R06 should be restated...
>
>
> R07: Support for both manual and automated evaluation.
> Comment (RW) : Not all Guidelines can be tested automatically and it is
> not viable to test some others manually. This needs to be clearer that
> the most appropriate methods will be used, whether manual or automatic.
> Where both options are available they must deliver the same result.
>
> *R07: Use the most appropriate manual or automatic evaluation. Where
> either could be used then both must deliver the same result.

DF: The issue is that automatic checks are most useful in conjunction 
with and prior to a manual validation. If, for example, code validation 
fails because authors used WAI-ARIA roles in HTML 4, we may want to 
apply an exception even to the best case for automatic testing.
But I am sure many will not agree with my re-write ;-)  :

R07 Automatic tools should be used where applicable but their results 
should be validated by human testers.
>
>
> R08: Users include (see target audience)
> Comment (RW) : Whilst user testing is essential for confirming
> accessibility it is not needed/essential for checking compliance with
> WCAG. If we feel that user testing is needed then we must specify what
> users, what skill level, what tasks etc..so that evaluators all use the
> same type of user and get the same type of result. I would prefer not to
> include users here as a requirement.

DF: In general I agree with Richard. Some complex dynamic interactions 
get very difficult to test without screen readers, however. So we might 
recommend (rather than mandate) testing with AT (whether by users or 
expert evaluators familiar with AT) for those aspects that can be 
difficult to test with tools like aChecker or Firebug, especially 1.3.2 
Meaningful Sequence and 4.1.2 Name, Role, Value.
>
>
> R09: Support for different contexts (i.e. self-assessment, third-party
> evaluation of small or larger websites).
> Comment (RW) : Agreed.

DF: Fine
>
>
> R10: Includes recommendations for sampling web pages and for expressing
> the scope of a conformance claim
> Comment (RW) : I agree. This is probably going to be the most difficult
> issue, but it is essential if our methodology is going to be useable in
> the real world as illustrated by discussions already taking place.
> Should it include tolerance metrics (R14)?

DF: The real bummer is how to define states on top of pages, and where 
to draw the line. For documentation and reporting and independent 
validation purposes, we want URLs to point back to the page tested. For 
states, we have to explain in addition how to get to them...
I think we would need to agree on an aggregation method of instance 
tests on the page level (or process level) that can reflect *criticality 
of the instance*. Or we may decide to pass the buck to individual 
implementations of the methodology - but it is clear then that any hint 
of replicability goes out of the window. Maybe:

R10: Include recommendations for sampling web pages *and states of 
pages*, and for expressing the scope of a conformance claim
>
>
> R11: Describes critical path analyses,
> Comment (RW) : I assume this is the CPA of the evaluation process (ie
> define website, test this, test that, write report etc.). In which case
> agreed

DF: If memory serves we agreed that Critical Path could be a term for a 
complete process (e.g., log in, send a mail via the contact form, 
conduct a search, buy a product)? I still prefer process or interaction 
sequence to avoid confusion with the process management term.
If the requirement in R11 is only to describe CP, it begs the question 
how we are going to test the CP. Is a CP a series of pages that is part 
of the page sample evaluated? Or are CPs processes that are defined as 
additional entities besides (or instead of) the page sample?

So R11 as it stands does not seem meaningful to me.
>
>
> R12: Covers computer assisted content selection and manual content
> selection
> Comment (RW) : I do not know what this means – can Eric explain ?

DF: I have the same question.

>
>
> R13: Includes integration and aggregation of the evaluation results and
> related conformance statements.
> Comment (RW) : I think this means “write a nice report” in which case I
> agree.

DF: I think it means a lot more - see my comment above to R10. the way 
it is stated now, R10 skirts the issue of whether we aggregate instance 
test results on a page level or, maybe, on the level of processes. But 
perhaps it is better to keep it general for now.

>
>
> R14: Includes tolerance metrics.
> Comment (RW) : Agreed – but maybe combine with R10

DF: Yes I think this is necessary, but needs better definition at some 
point. I'm not sure how the assessment of criticality of an 
accessibility issue captured in some SC test will surface in tolerance 
metrics, but I guess it can be done.
>
>
> R15: The Methodology includes recommendations for harmonized
> (machine-readable) reporting.
> Comment (RW) : I am not sure that methodologies recommend things. Do you
> mean
>
> *R15: Reports must be machine readable.

DF: if this means that reports generated should be a well-formed HTML 
document, this is fine. If 'harmonized' means filtering out comments 
since they are not just TRUE or FALSE, I disagree (this can, of course, 
be useful on the level of benchmarking, but it would cripple the meaning 
of a test report for human consumption).
>
>
> Best wishes
> Richard (RW)
>
> -----Original Message----- From: Velleman, Eric
> Sent: Wednesday, August 31, 2011 12:56 PM
> To: public-wai-evaltf@w3.org
> Subject: Appendix to the agenda: Requirements draft
>
> Dear Eval TF,
>
> In our call, we will discuss further on the questions that are on the
> list. Please also react online. As a result of our last call, below you
> find a first draft of the possible requirements for the methodology. We
> will discuss this further tomorrow in our call:
>
> First Draft Section on Requirements
>
> * Objectives:
> The main objective is an internationally harmonized methodology for
> evaluating the conformance of websites to WCAG 2.0. This methodology
> will support different contexts, such as for self-assessment or
> third-party evaluation of small or larger websites.
> It intends to cover recommendations for sampling web pages and for
> expressing the scope of a conformance claim, critical path analyses,
> computer assisted content selection, manual content selection, the
> evaluation of web pages, integration and aggregation of the evaluation
> results and conformance statements. The methodology will also address
> tolerance metrics.
> The Methodology also includes recommendations for harmonized
> (machine-readable) reporting.
>
> This work is part of other related W3C/WAI activities around evaluation
> and testing.
> More on the EvalTF page.
>
> * Target Audience:
> A01: All organization evaluating one or more websites
> A02: Web accessibility benchmarking organizations
> A03: Web content producers wishing to evaluate their content
> A04: Developers of Evaluation and Repair Tools
> A05: Policy makers and Web site owners wishing to evaluate websites
>
> The person(s) using the Methodology should be knowledgeable of the
> Guidelines and people with disabilities.
>
> * Requirements:
> R01: Technical conformance to existing Web Accessibility Initiative
> (WAI) Recommendations and Techniques documents.
> R02: Tool and browser independent
> R03: Unique interpretation
> R04: Replicability: different Web accessibility evaluators who perform
> the same tests on the same site should get the same results within a
> given tolerance.
> R05: Translatable
> R06: The methodology points to the existing tests in the techniques
> documents and does not reproduce them.
> R07: Support for both manual and automated evaluation.
> R08: Users include (see target audience)
> R09: Support for different contexts (i.e. self-assessment, third-party
> evaluation of small or larger websites).
> R10: Includes recommendations for sampling web pages and for expressing
> the scope of a conformance claim
> R11: Describes critical path analyses,
> R12: Covers computer assisted content selection and manual content
> selection
> R13: Includes integration and aggregation of the evaluation results and
> related conformance statements.
> R14: Includes tolerance metrics.
> R15: The Methodology includes recommendations for harmonized
> (machine-readable) reporting.
>
> The methodology describes the expected level of expertise for persons
> carrying out the evaluation and the possibility to conduct evaluations
> in teams using roles. There is also a description of the necessity to
> involve people with disabilities.
>
>
>


-- 
---------------------------------------------------------------
Detlev Fischer PhD
DIAS GmbH - Daten, Informationssysteme und Analysen im Sozialen
Geschäftsführung: Thomas Lilienthal, Michael Zapp

Telefon: +49-40-43 18 75-25
Mobile: +49-157 7-170 73 84
Fax: +49-40-43 18 75-19
E-Mail: fischer@dias.de

Anschrift: Schulterblatt 36, D-20357 Hamburg
Amtsgericht Hamburg HRB 58 167
Geschäftsführer: Thomas Lilienthal, Michael Zapp
---------------------------------------------------------------
Received on Monday, 12 September 2011 08:21:13 UTC