AW: Requirements draft from Kerstin Probiesch on 2011-09-13 (public-wai-evaltf@w3.org from September 2011)

From: Kerstin Probiesch <k.probiesch@googlemail.com>
Date: Tue, 13 Sep 2011 08:44:17 +0200
To: "'Denis Boudreau'" <dboudreau@accessibiliteweb.com>, "'Eval TF'" <public-wai-evaltf@w3.org>
Message-ID: <4e6efb10.4521df0a.3e66.134b@mx.google.com>
Hello Denis, all,

> -----Ursprüngliche Nachricht-----
> Von: public-wai-evaltf-request@w3.org [mailto:public-wai-evaltf-
> request@w3.org] Im Auftrag von Denis Boudreau
> Gesendet: Montag, 12. September 2011 21:49
> An: Eval TF
> Betreff: Re: Requirements draft
> 
> Good morning everyone,
> 
> Here's my take on the whole thing.
> 
> 
> On 2011-09-12, at 4:42 AM, Kerstin Probiesch wrote:
> 
> >> * Requirements:
> >>> R01: Technical conformance to existing Web Accessibility Initiative
> (WAI) Recommendations and Techniques documents.
> >
> >> Comment (RW) :  I do not think we need the word technical. We should
> stick with WCAG as agreed when we discussed *A01.  The recommendations
> and techniques are not relevant here as our priority is the Guidelines.
> It is possible for someone to comply with a particular guideline
> without using any of the recommended techniques. What we are after is
> methodology.  I therefore suggest a suitable alternative as follows:
> >> *R01 Define methods for evaluating compliance with the accessibility
> guidelines (WCAG)
> >
> > Comment (KP): As I understood R01 it stresses the formal level. If
> the formulation would be "R01: Technical conformance to existing Web
> Accessibility Initiative (WAI) Recommendations and Techniques" I would
> agree. Because we have in the WCAG sub-documents like "understanding",
> "glossary" and so on. For that "documents" for me is ok. Because of
> other WAI documents like e.g. ATAG I would agree with
> > As long as the formal level of the documents itself and not the
> techniques which are in the documents is meant.
> 
> Comment (DB): I believe we need to stay on a macro level, as we're
> talking general methodology here. We'll have plenty of time to delve
> right in eventually. Right now , our main focus should be compliance
> with the WCAG as a whole, not to each and every techniques that may or
> may not exist at the time of this writing. Just to build up on
> Richard's proposal, I would therefore suggest:
> 
> *R01 Defining methods for evaluating WCAG 2.0 compliance
> 
> WCAG will most already have been defined in this document, so there's
> no need to repeat it each and every time.
> 

Comment (KP): Please would someone (Shadi, Eric) give a short statement,
weather we speak about form or content?

> 
> >>> R02: Tool and browser independent
> >
> >> Comment (RW) : The principle is good but sometimes it may be
> necessary to use a particular tool such as a text-only browser. So I
> would prefer :
> >> *R02: Where possible the evaluation process should be tool and
> browser independent.
> >
> > Comment (KP): I partly agree with "possible". When we use "possible"
> we should then describe/define what "possible" exactly means.
> 
> Comment (DB): Right. That works for me too. But I'd rather keep them
> short. So I'd vote for:
> 
> *R02: Tool and browser independent (where possible)
> 

Comment (KP): I'm ok with that we will find a definition of "possible". 
 
> >>> R03: Unique interpretation
> >
> >> Comment (RW) : I think this means that it should be unambiguous,
> that means it  is not open to different interpretations. I am pretty
> sure that the W3C has a standard clause it uses to cover this point
> when building standards etc. Hopefully Shadi can find it <Grin> . This
> also implies use of standard terminology which we should be looking at
> as soon as possible so that terms like “atomic testing” do not creep
> into our procedures without clear /agreed definitions.
> >
> > Comment (KP): Using standard terminology is an important point also
> for me. And I suggest that we should also regard the standard
> terminology used I testing theory. The advantage would be that we are
> using established terms which will help to avoid misunderstandings.
> 
> Comment (DB): Using standard terminology is of outmost importance to me
> as well. However, I personally do not believe in a single
> interpretation for any success criteria. And I certainly do not believe
> in the possibility of everyone under the same interpretation. What we
> should focus on is achieved results (compliance), not how people
> actually got there (technique used). There are multiple ways to
> interpret the guidelines and our methodology should reflect this.
> Instead of striving for "unique interpretation" I would much rather go
> for "agreed interpretation", even if this means actually building a
> document where we would document what those divergent interpretations
> mean. So, I suggest going with:
> 
> *R03: Agreed interpretations

Comment (KP): I understand the Denis' arguments. The more I think about
this: neither "unique interpretation" nor "agreed interpretation" work very
well. I would like to suggest "Objective". Because of the following reason:
It would be one of Criteria for the quality of tests and includes Execution
objectivity, Analysis objectivity and Interpretation objectivity. If we will
have in some cases 100% percent fine, if not we can discuss the "tolerance".
I would suggest:

*R03: Objectivity


> >>> R04: Replicability: different Web accessibility evaluators who
> perform the same tests on the same site should get the same results
> within a given tolerance.
> >
> >> Comment (RW) : The first part is good, but I am not happy with
> introducing “tolerance” at this stage. I think we should be clear that
> we are after consistent, replicable tests. I think we should add
> separate requirement later for such things as “partial compliance” and
> “tolerance. See R14 below.
> >>
> >> *R04: Replicability: different Web accessibility evaluators who
> perform the same tests on the same site should get the same results.
> >
> > Comment (KP): I strongly agree with Richard. Except "Replicability"
> and would suggest:
> >
> > R04: Reliability: different Web accessibility evaluators who perform
> the same tests on the same site should get the same results.
> 
> Comment (DB): As long as we take into consideration that there can be
> different ways/tools to run those tests, then yes, reliability and
> replicability are important. Getting to different results usually means
> evaluators do not interpret the rules the same way. This dos not always
> mean that one is wrong and the other is right. So again, to keep those
> short, I would simply go with

> *R04: Reliable and replicable

Comment (KP): I'm happy with that, as long as it will not include a decision
for or against any specific evaluation methodology which one of the
participants in this TF uses.
 
> The explanation that follows could then reflect the idea that different
> evaluators performing the same tests on the same site should get the
> same results.
> 
> 
> 
> >>> R05: Translatable
> >
> >> Comment (RW) : As in translatable into different languages – Yes -
> agree
> >
> > Comment (KP): I agree and I see especially translatable in the
> context of using standard terminology which would be helpful for
> translating.
> 
> Comment (DB): +1.
> 
> *R05: Translatable
> 
> 
> 
> >>> R06: The methodology points to the existing tests in the techniques
> documents and does not reproduce them.
> >
> > Comment (KP): I agree.
> >
> >> Comment (RW) : yes – but I would like it a bit clearer that it is
> WCAG techniques.  I would also like the option to introduce a new
> technique if it becomes available. So I suggest
> >> *R06 Where possible the methodology should point to existing tests
> and techniques in the WCAG documentation.
> 
> Comments (DB): I agree with the general idea here as well, but it needs
> to be shorter. We can aways reflect the intention in the description
> that follows.
> 
> *R06 Pointing to existing tests and techniques (where possible).
> 
> 
> 
> >>> R07: Support for both manual and automated evaluation.
> >
> >> Comment (RW) :  Not all Guidelines can be tested automatically and
> it is not viable to test some others manually. This needs to be clearer
> that the most appropriate methods will be used, whether manual or
> automatic. Where both options are available they must deliver the same
> result.
> >>
> >> *R07:  Use the most appropriate manual or automatic evaluation.
> Where either could be used then both must deliver the same result.
> >
> > Comment (KP): I see "support" as just support and the important point
> "deliver the same result" in the context of R04 "Replicability" or as I
> suggest "Reliability".
> 
> Comments (DB): I agree with the general idea here as well, but again,
> it needs to be shorter. We can aways reflect the importance of using
> the most appropriate approach in the document itself.
> 
> *R07: Reliable evaluation support (manual or automated).

Comment (KP): I'm still not sure about what "support" means.

 
> >>> R08: Users include (see target audience)
> >
> >> Comment (RW) : Whilst user testing is essential  for confirming
> accessibility it is not needed/essential for checking compliance with
> WCAG. If we feel that user testing is needed then we must specify what
> users, what skill level, what tasks etc..so that evaluators all use the
> same type of user and get the same type of result. I would prefer not
> to include users here as a requirement.
> >
> > Comment (KP): A tricky R. - especially in the context of the above
> mentioned "It is possible for someone to comply with a particular
> guideline without using any of the recommended techniques." The
> question would be: How a tester can find out if an SC is met when the
> recommended techniques are not used? Wouldn't that mean that a tester
> needs deep knowledge in using for example Screenreaders as well as
> Magnifiers and ... We discussed this also in an another mail thread. I
> prefer to include users here but we have to describe what users
> according to Richards consideration in the above paragraph.
> 
> Comments (DB): Testing with "real uses" should be encouraged, but in no
> way should it be made mandatory. The only requirement should be to run
> tests with a skilled screen reader user, following a specific
> evaluation methodology. All the better if this evaluator happens to be
> a real user. So:
> 
> *R08: Users include (see target audience)
> 
> 
> 
> >>> R09: Support for different contexts (i.e. self-assessment, third-
> party evaluation of small or larger websites).
> >
> >> Comment (RW) :  Agreed.
> > Comment (KP): Agree
> 
> Comments (DB): . +1.
> 
> 
> 
> >>> R10: Includes recommendations for sampling web pages and for
> expressing the scope of a conformance claim
> >
> >> Comment (RW) : I agree. This is probably going to be the most
> difficult issue, but it is essential if our methodology is going to be
> useable in the real world as illustrated by discussions already taking
> place. Should it include tolerance metrics (R14)?
> >
> > Comment (KP): I also think it’s the most difficult issue. Because of
> the ongoing discussion about different approaches I want to abstain for
> the moment.
> 
> Comments (DB): While I seem to be a little more optimistic than you
> two, it is an important issue. I wish that we can draw from everybody's
> experience and come up with something new and improved, compared to our
> respective approaches.
> 
> *R10: Web pages sampling recommendations.

Comment (KP): I still want to abstain before making a decision. But I think
we need the "scope of conformance claim". In find this a very important
issue for the validity of the evaluation methodology.

> We can aways reflect the importance of expressing the scope of
> conformance claim in the document itself.
> 
> 
> 
> >>> R11: Describes critical path analyses,
> >> Comment (RW) :  I assume this is the CPA of the evaluation process
> (ie define website, test this, test that, write report etc.). In which
> case agreed
> 
> > Comment (KP): I'm not sure what is meant by this R. Because of that
> no vote from me now.
> 
> Comments (DB): Agreed as well. This is something we never officially
> did ourselves at AccessibiltéWeb, but it does look like a great idea.
> 
> *R11: Describes critical path analyses.
> 
> 
> 
> >>> R12: Covers computer assisted content selection and manual content
> selection
> >
> >> Comment (RW) : I do not know what this means – can Eric explain ?
> > Comment (KP): I also don't have a exactly idea what this R. could
> mean.
> 
> Comments (DB): Isn't this directly related to page sampling
> determination and critical paths analyses? I get the manual content
> selection part, but I can't understand how this could be computer
> generated in any way... right now, I don't see why this couldn't just
> be a part of R11.
> 
> 
> 
> >>> R13: Includes integration and aggregation of the evaluation results
> and related conformance statements.
> >
> >> Comment (RW) : I think this means “write a nice report” in which
> case I agree.
> > Comment (KP): I agree.
> 
> Comments (DB): Lol, reports are crucial indeed and every report should
> be technically-biaised, with a good executive summary for the faint-
> hearted. But this R is definitely too complicated as is.
> 
> *R13: Evaluation reports and related conformance statements.
> 
> 
> 
> >>> R14: Includes tolerance metrics.
> >
> >> Comment (RW) : Agreed – but maybe combine with R10
> > Comment (KP): The tolerance metrics will depend on the testing
> procedure itself. Because of that for me I'm happy with that and
> suggest not to combine with any other R.
> 
> Comments (DB): I can see why it could be integrated with R10, but don't
> really mind if it's not. I think the wording is appropriate.
> 
> *R14: Includes tolerance metrics.
> 
> 
> 
> 
> >>> R15: The Methodology includes recommendations for harmonized
> (machine-readable) reporting.
> >
> >> Comment (RW) : I am not sure that methodologies recommend things. Do
> you mean
> >>
> >> *R15: Reports must be machine readable.
> >
> > Comment (KP): As I understood R15 this means e.g. structures in
> documents but also recommendations for the content structure. If so, I
> agree with R15.
> 
> Comments (DB): Shouldn't this be a part of R13 as well?
> 
> *R15: Recommendations for harmonized (machine-readable) reporting

Comment (KP): I agree.

Best

Kerstin


> 
> Best regards,
> 
> /Denis
Received on Tuesday, 13 September 2011 06:41:51 UTC