RE: Requirements draft from Vivienne CONWAY on 2011-09-13 (public-wai-evaltf@w3.org from September 2011)

From: Vivienne CONWAY <v.conway@ecu.edu.au>
Date: Tue, 13 Sep 2011 15:16:16 +0800
To: Kerstin Probiesch <k.probiesch@googlemail.com>, "'Denis Boudreau'" <dboudreau@accessibiliteweb.com>, "'Eval TF'" <public-wai-evaltf@w3.org>
Message-ID: <8AFA77741B11DB47B24131F1E38227A98CAAEDA2CA@XCHG-MS1.ads.ecu.edu.au>
HI all,

I've added my comments to the dialogue.


>
> Here's my take on the whole thing.
>
>
> On 2011-09-12, at 4:42 AM, Kerstin Probiesch wrote:
>
> >> * Requirements:
> >>> R01: Technical conformance to existing Web Accessibility Initiative
> (WAI) Recommendations and Techniques documents.
> >
> >> Comment (RW) :  I do not think we need the word technical. We should
> stick with WCAG as agreed when we discussed *A01.  The recommendations
> and techniques are not relevant here as our priority is the Guidelines.
> It is possible for someone to comply with a particular guideline
> without using any of the recommended techniques. What we are after is
> methodology.  I therefore suggest a suitable alternative as follows:
> >> *R01 Define methods for evaluating compliance with the accessibility
> guidelines (WCAG)
> >
> > Comment (KP): As I understood R01 it stresses the formal level. If
> the formulation would be "R01: Technical conformance to existing Web
> Accessibility Initiative (WAI) Recommendations and Techniques" I would
> agree. Because we have in the WCAG sub-documents like "understanding",
> "glossary" and so on. For that "documents" for me is ok. Because of
> other WAI documents like e.g. ATAG I would agree with
> > As long as the formal level of the documents itself and not the
> techniques which are in the documents is meant.
>
> Comment (DB): I believe we need to stay on a macro level, as we're
> talking general methodology here. We'll have plenty of time to delve
> right in eventually. Right now , our main focus should be compliance
> with the WCAG as a whole, not to each and every techniques that may or
> may not exist at the time of this writing. Just to build up on
> Richard's proposal, I would therefore suggest:
>
> *R01 Defining methods for evaluating WCAG 2.0 compliance
>
> WCAG will most already have been defined in this document, so there's
> no need to repeat it each and every time.
>Comments (VC): agree

Comment (KP): Please would someone (Shadi, Eric) give a short statement,
weather we speak about form or content?

>
> >>> R02: Tool and browser independent
> >
> >> Comment (RW) : The principle is good but sometimes it may be
> necessary to use a particular tool such as a text-only browser. So I
> would prefer :
> >> *R02: Where possible the evaluation process should be tool and
> browser independent.
> >
> > Comment (KP): I partly agree with "possible". When we use "possible"
> we should then describe/define what "possible" exactly means.
>
> Comment (DB): Right. That works for me too. But I'd rather keep them
> short. So I'd vote for:
>Comments (VC): Agree also
> *R02: Tool and browser independent (where possible)

>

Comment (KP): I'm ok with that we will find a definition of "possible".

> >>> R03: Unique interpretation
> >
> >> Comment (RW) : I think this means that it should be unambiguous,
> that means it  is not open to different interpretations. I am pretty
> sure that the W3C has a standard clause it uses to cover this point
> when building standards etc. Hopefully Shadi can find it <Grin> . This
> also implies use of standard terminology which we should be looking at
> as soon as possible so that terms like “atomic testing” do not creep
> into our procedures without clear /agreed definitions.
> >
> > Comment (KP): Using standard terminology is an important point also
> for me. And I suggest that we should also regard the standard
> terminology used I testing theory. The advantage would be that we are
> using established terms which will help to avoid misunderstandings.
>
> Comment (DB): Using standard terminology is of outmost importance to me
> as well. However, I personally do not believe in a single
> interpretation for any success criteria. And I certainly do not believe
> in the possibility of everyone under the same interpretation. What we
> should focus on is achieved results (compliance), not how people
> actually got there (technique used). There are multiple ways to
> interpret the guidelines and our methodology should reflect this.
> Instead of striving for "unique interpretation" I would much rather go
> for "agreed interpretation", even if this means actually building a
> document where we would document what those divergent interpretations
> mean. So, I suggest going with:
>
> *R03: Agreed interpretations

Comment (KP): I understand the Denis' arguments. The more I think about
this: neither "unique interpretation" nor "agreed interpretation" work very
well. I would like to suggest "Objective". Because of the following reason:
It would be one of Criteria for the quality of tests and includes Execution
objectivity, Analysis objectivity and Interpretation objectivity. If we will
have in some cases 100% percent fine, if not we can discuss the "tolerance".
I would suggest:

(VC)  I'm still contemplating this one.  I can see both arguments as plausible.
I'm okay with 'objectivity' but think it needs more explanation i.e. who defines
how objective it is?

*R03: Objectivity


> >>> R04: Replicability: different Web accessibility evaluators who
> perform the same tests on the same site should get the same results
> within a given tolerance.
> >
> >> Comment (RW) : The first part is good, but I am not happy with
> introducing “tolerance” at this stage. I think we should be clear that
> we are after consistent, replicable tests. I think we should add
> separate requirement later for such things as “partial compliance” and
> “tolerance. See R14 below.
> >>
> >> *R04: Replicability: different Web accessibility evaluators who
> perform the same tests on the same site should get the same results.
> >
> > Comment (KP): I strongly agree with Richard. Except "Replicability"
> and would suggest:
> >
> > R04: Reliability: different Web accessibility evaluators who perform
> the same tests on the same site should get the same results.
>
> Comment (DB): As long as we take into consideration that there can be
> different ways/tools to run those tests, then yes, reliability and
> replicability are important. Getting to different results usually means
> evaluators do not interpret the rules the same way. This dos not always
> mean that one is wrong and the other is right. So again, to keep those
> short, I would simply go with

> *R04: Reliable and replicable

Comment (KP): I'm happy with that, as long as it will not include a decision
for or against any specific evaluation methodology which one of the
participants in this TF uses.

> The explanation that follows could then reflect the idea that different
> evaluators performing the same tests on the same site should get the
> same results.
>
>(VC) +1 for Kerstin's  i.e. reliable and replicable: understanding that different
evaluators using the same methods on the same sites should obtain the same results

>
> >>> R05: Translatable
> >
> >> Comment (RW) : As in translatable into different languages – Yes -
> agree
> >
> > Comment (KP): I agree and I see especially translatable in the
> context of using standard terminology which would be helpful for
> translating.
>
> Comment (DB): +1.
>Comments(VC): +1
> *R05: Translatable

>
>
> >>> R06: The methodology points to the existing tests in the techniques
> documents and does not reproduce them.
> >
> > Comment (KP): I agree.
> >
> >> Comment (RW) : yes – but I would like it a bit clearer that it is
> WCAG techniques.  I would also like the option to introduce a new
> technique if it becomes available. So I suggest
> >> *R06 Where possible the methodology should point to existing tests
> and techniques in the WCAG documentation.
>
> Comments (DB): I agree with the general idea here as well, but it needs
> to be shorter. We can aways reflect the intention in the description
> that follows.
Comments (VC): +1
>
> *R06 Pointing to existing tests and techniques (where possible).
>

>
> >>> R07: Support for both manual and automated evaluation.
> >
> >> Comment (RW) :  Not all Guidelines can be tested automatically and
> it is not viable to test some others manually. This needs to be clearer
> that the most appropriate methods will be used, whether manual or
> automatic. Where both options are available they must deliver the same
> result.
> >>
> >> *R07:  Use the most appropriate manual or automatic evaluation.
> Where either could be used then both must deliver the same result.
> >
> > Comment (KP): I see "support" as just support and the important point
> "deliver the same result" in the context of R04 "Replicability" or as I
> suggest "Reliability".
>
> Comments (DB): I agree with the general idea here as well, but again,
> it needs to be shorter. We can aways reflect the importance of using
> the most appropriate approach in the document itself.
>
> *R07: Reliable evaluation support (manual or automated).

Comment (KP): I'm still not sure about what "support" means.

(VC) I like:*R07:  Use the most appropriate manual or automatic evaluation.
> Where either could be used then both must deliver the same result.

> >>> R08: Users include (see target audience)
> >
> >> Comment (RW) : Whilst user testing is essential  for confirming
> accessibility it is not needed/essential for checking compliance with
> WCAG. If we feel that user testing is needed then we must specify what
> users, what skill level, what tasks etc..so that evaluators all use the
> same type of user and get the same type of result. I would prefer not
> to include users here as a requirement.
> >
> > Comment (KP): A tricky R. - especially in the context of the above
> mentioned "It is possible for someone to comply with a particular
> guideline without using any of the recommended techniques." The
> question would be: How a tester can find out if an SC is met when the
> recommended techniques are not used? Wouldn't that mean that a tester
> needs deep knowledge in using for example Screenreaders as well as
> Magnifiers and ... We discussed this also in an another mail thread. I
> prefer to include users here but we have to describe what users
> according to Richards consideration in the above paragraph.
>
> Comments (DB): Testing with "real uses" should be encouraged, but in no
> way should it be made mandatory. The only requirement should be to run
> tests with a skilled screen reader user, following a specific
> evaluation methodology. All the better if this evaluator happens to be
> a real user. So:
>
> *R08: Users include (see target audience)
>
>(VC) +1.  However I'm not sure about putting the screenreader as
mandatory.  Not everyone is proficient with screen-readers and in my experience
using one poorly often gives a very unreliable result.
>
> >>> R09: Support for different contexts (i.e. self-assessment, third-
> party evaluation of small or larger websites).
> >
> >> Comment (RW) :  Agreed.
> > Comment (KP): Agree
>
> Comments (DB): . +1.
Comments (VC): +1
>
>
>
> >>> R10: Includes recommendations for sampling web pages and for
> expressing the scope of a conformance claim
> >
> >> Comment (RW) : I agree. This is probably going to be the most
> difficult issue, but it is essential if our methodology is going to be
> useable in the real world as illustrated by discussions already taking
> place. Should it include tolerance metrics (R14)?
> >
> > Comment (KP): I also think it’s the most difficult issue. Because of
> the ongoing discussion about different approaches I want to abstain for
> the moment.
>
> Comments (DB): While I seem to be a little more optimistic than you
> two, it is an important issue. I wish that we can draw from everybody's
> experience and come up with something new and improved, compared to our
> respective approaches.

Comment (VC): This is going to be difficult.  However I think we will be able
to come up with something workable when we all discuss how we handle this.
I think we'll all learn some new techniques here.
>
> *R10: Web pages sampling recommendations.

Comment (KP): I still want to abstain before making a decision. But I think
we need the "scope of conformance claim". In find this a very important
issue for the validity of the evaluation methodology.

> We can aways reflect the importance of expressing the scope of
> conformance claim in the document itself.
>
>
>
> >>> R11: Describes critical path analyses,
> >> Comment (RW) :  I assume this is the CPA of the evaluation process
> (ie define website, test this, test that, write report etc.). In which
> case agreed
>
> > Comment (KP): I'm not sure what is meant by this R. Because of that
> no vote from me now.
>
> Comments (DB): Agreed as well. This is something we never officially
> did ourselves at AccessibiltéWeb, but it does look like a great idea.
>
> *R11: Describes critical path analyses.
Comments (VC) : not sure on this one
>
>
>
> >>> R12: Covers computer assisted content selection and manual content
> selection
> >
> >> Comment (RW) : I do not know what this means – can Eric explain ?
> > Comment (KP): I also don't have a exactly idea what this R. could
> mean.
>
> Comments (DB): Isn't this directly related to page sampling
> determination and critical paths analyses? I get the manual content
> selection part, but I can't understand how this could be computer
> generated in any way... right now, I don't see why this couldn't just
> be a part of R11.

Comments (VC) this does sound like R11, I think we need some
clarification here.
>
>
>
> >>> R13: Includes integration and aggregation of the evaluation results
> and related conformance statements.
> >
> >> Comment (RW) : I think this means “write a nice report” in which
> case I agree.
> > Comment (KP): I agree.
>
> Comments (DB): Lol, reports are crucial indeed and every report should
> be technically-biaised, with a good executive summary for the faint-
> hearted. But this R is definitely too complicated as is.
>
> *R13: Evaluation reports and related conformance statements.
>Comments (VC): agree
>
>
> >>> R14: Includes tolerance metrics.
> >
> >> Comment (RW) : Agreed – but maybe combine with R10
> > Comment (KP): The tolerance metrics will depend on the testing
> procedure itself. Because of that for me I'm happy with that and
> suggest not to combine with any other R.
>
> Comments (DB): I can see why it could be integrated with R10, but don't
> really mind if it's not. I think the wording is appropriate.

Comments (VC) agree
>
> *R14: Includes tolerance metrics.
>
>
>
>
> >>> R15: The Methodology includes recommendations for harmonized
> (machine-readable) reporting.
> >
> >> Comment (RW) : I am not sure that methodologies recommend things. Do
> you mean
> >>
> >> *R15: Reports must be machine readable.
> >
> > Comment (KP): As I understood R15 this means e.g. structures in
> documents but also recommendations for the content structure. If so, I
> agree with R15.
>
> Comments (DB): Shouldn't this be a part of R13 as well?
>
> *R15: Recommendations for harmonized (machine-readable) reporting

Comment (KP): I agree.
Comments (VC): isn't this what EARL is all about?  I'm not sure on
this one.

Best

Kerstin


>
> Best regards,
>
> /Denis



This e-mail is confidential. If you are not the intended recipient you must not disclose or use the information contained within. If you have received it in error please return it to the sender via reply e-mail and delete any record of it from your system. The information contained within is not the opinion of Edith Cowan University in general and the University accepts no liability for the accuracy of the information provided.

CRICOS IPC 00279B
Received on Tuesday, 13 September 2011 07:17:10 UTC