AW: Alternative concise requirements from Kerstin Probiesch on 2011-10-06 (public-wai-evaltf@w3.org from October 2011)

From: Kerstin Probiesch <k.probiesch@googlemail.com>
Date: Thu, 6 Oct 2011 08:42:35 +0200
To: "'Eval TF'" <public-wai-evaltf@w3.org>
Message-ID: <4e8d4dbc.8758df0a.7c98.ffff9901@mx.google.com>
Hello Eval TF,

I'm very concerned that we will come up with a methodology which will not
follow international agreed criteria of quality for testing procedures. I
don't want to repeat what I've written in other mails. Just: when we are
leaving reliability, objectivity and validity in my opinion there is nothing
won and in the worst case we will just have "some tips for testing".

As I see it: objectivity is not included and validity is gone.

Anyway: I agree that we should reduce the number of requierements. Please
find my comments after the requierements.

> -----Ursprüngliche Nachricht-----
> Von: public-wai-evaltf-request@w3.org [mailto:public-wai-evaltf-
> request@w3.org] Im Auftrag von Denis Boudreau
> Gesendet: Donnerstag, 6. Oktober 2011 06:31
> An: Eval TF
> Betreff: Re: Alternative concise requirements
> 
> Hello EvalTF,
> 
> I'm all in favor of reducing the number of requirements, if we can
> include everything in just 10.
> 
> My comments follow.
> 
> 
> On 2011-10-04, at 1:05 AM, Shadi Abou-Zahra wrote:
> 
> 
> >> RQ 01 : Define methods for evaluating WCAG 2.0 conformance
> >> The Methodology provides methods to measure conformance with WCAG
> 2.0. that can be used by the target audience (see section 2 above) for
> evaluating small or large websites, sections of websites or web-based
> applications.
> >
> > Minor: "for evaluating small or large websites, sections of websites
> and web-based applications" (changed "or" to "and").
> 
> Minor: remove the dot (.) after the 2.0

I agree.


> 
> 
> >> RQ 02 – Unambiguous Interpretation
> >> The methodology is written in clear language, understandable to the
> target audience and capable of translation to other languages.
> >
> > I think the title "Unambiguous Interpretation" does not match the
> description. Maybe something like "Clear, understandable, and
> translatable language" instead?
> 
> +1

I agree with clear, understandable and translatable.


> 
> 
> >> RQ 03 – Reliable
> >> Different Web accessibility evaluators using the same methods on the
> same website(s) should get the same results. Evaluation process and
> results are documented to support independent verification.
> >
> > Maybe "equivalent results" rather than "*same* results"?
> 
> Equivalent results or similar results.

When we are leaving internationally agreed definitions of reliability, this
criteria of quality is gone. And more: If we leave *same* (which we will
define in the methodology) it could easily violates our R02 for it will be
less translatable. The term "equivalent" sounds good in the first moment,
but I fear that equivalent is a kind of a trap. 


> 
> >> RQ 04 - Tool and browser independent
> >> The use and application of the Methodology is vendor-neutral and
> platform-independent. It is not restricted to solely manual or
> automated testing but allows for either or a combination of approaches.
> >
> > I think we need to clarify "vendor-neutral" and "platform-
> independent". I also think that the Methodology as a whole will have to
> rely on a combined manual and automated approach. My suggestion is:
> >
> > [[
> > The use and application of the Methodology is independent of any
> particular evaluation tools, browsers, and assistive technology. It
> requires combined use of manual and automated testing approaches to
> carry out a full evaluation according to the Methodology.
> > ]]
> 
> +1

I agree.

> 
> 
> >> RQ 05 -  QA framework specification guidelines
> >> The Methodology will conform to the Quality Assurance framework
> specification guidelines as set in: http://www.w3.org/TR/qaframe-spec/.
> 
> +1

I agree

 
> >> RQ 06 - Machine-readable reporting
> >> The Methodology includes recommendations for harmonized (machine-
> readable) reporting. It provides a format for delivering machine-
> readable reports using Evaluation and Report Language (EARL) in
> addition to using the standard template as at
> http://www.w3.org/WAI/eval/template.html
> >
> > I think that the focus on human-readable reporting is more important
> than on machine-readable ones. Here is my suggestion:
> >
> > [[
> > RQ 06 - Reporting
> > The Methodology includes recommendations for reporting evaluation
> findings. It will be based on the
> [href=http://www.w3.org/WAI/eval/template.html standard template] and
> supplemented with machine-readable
> [href=http://www.w3.org/WAI/intro/earl reports using Evaluation and
> Report Language (EARL)].
> > ]]
> 
> Sounds very good.

I agree that human-readable reporting is more important than
machine-readable.

> 
> >> RQ 07 -  Use of existing WCAG 2.0 techniques
> >> Wherever possible the Methodology will employ existing testing
> procedures in the WCAG 2.0 Techniques documents rather than replicate
> them.
> 
> +1

I'm not sure about this one. Probably I'm just in doubt of the title of the
RQ. When following the WCAG 2.0 public comments one reads very often the
following comment from the Working Group: " Use of any technique is
optional. They are just ways of doing it if you want to use them."

I would suggest: "RQ07 - Use of existing WCAG 2.0 testing procedures"

 
> >> RQ 08 -  Recommendations for scope and sampling
> >> It includes recommendations for methods of sampling web pages in
> large websites and how to ensure that complete processes (such as for a
> shopping site where all the pages that are part of the steps in an
> ordering process) are included.  Such selections would be reflected in
> any conformance claim.
> >
> > Minor: I stumbled over "large" -- is a website with say 50 or 100
> pages considered large? It would still need sampling to evaluate...
> 
> I don't think the sampling of pages need to be different if the site is
> large or small. Obviously, we'd take less pages on a smaller site, but
> in each case, the methodology should encourage to look for significant
> and representative pages.

I also stumbled over "large", cause I don't see a chance for an
international agreement about what "large" might be.

For the case of "sampling web pages" I'm still unconvinced that testing
results and reports based upon X tested pages will be valid against
Conformance Requierement 1, if this report will make a claim for the whole
website. I think there must be a testing procedures which includes pages
*and* atomic tests for those SCs which are not still violates after testing
"some" pages.

> >> RQ 09 -  Includes tolerance metrics
> >> It includes calculation methods for determining nearness of
> conformance.  Depending on the amount of tolerance, a failure could
> fall within a certain tolerance level meaning that the page or website
> might be considered conformant even though there is a failure. Such
> tolerances would be reflected in any conformance claim.
> 
> While I agre conceptually, I still have to understand how we could come
> around to doing this without falling into subjectivity.

I agree with RQ09 and think we will define this later and for sure we will
have interesting discussions about this issue. 

I think that our Requierements are not engraved in stone (Moses 2.0), or? I
understand the point "falling into subjectivity" very well, this is always
my point when stressing the criteria of quality for evaluation methods, but
I fear, that we will sit there in one year and still discussing the RQs
without even tried to define the tolerance metrics.  
 
> >> RQ 10 - Support documentation
> >> The document will give a short description of the knowledge
> necessary for using the Methodology for evaluations.
> 
> Short description and provide links to said support documentation?

I agree with the RQ and think that it should be a description not links,
when we don't want to check the links again and again.

As written above I'm missing: Objectivity and Validity as RQ. Without those
and in addition with softening the Reliability we are opening the door to
subjectivity, which we want to avoid.

One other point: I see jokes arising about the "Ten Commandments of the
EvalTF" and the eleventh one: "Don't get caught" (when you are violating
those RQs in your own testing procedures). 12 RQs are much better ;-)

Best

Kerstin
> 
> /Denis
Received on Thursday, 6 October 2011 06:49:04 UTC