AW: EvalTF discussion 5.5 and actual evaluation from Kerstin Probiesch on 2012-01-16 (public-wai-evaltf@w3.org from January 2012)

From: Kerstin Probiesch <k.probiesch@googlemail.com>
Date: Mon, 16 Jan 2012 09:11:58 +0100
To: "'Alistair Garrison'" <alistair.j.garrison@gmail.com>, "'Eval TF'" <public-wai-evaltf@w3.org>
Message-ID: <4f13dbc9.0f0a0e0a.7a54.ffffc4d0@mx.google.com>
Hi Alistair, all,

I think we should be very careful with any testing procedures which rely on
techniques. Techniques are mainly for developers/authors. In the Techniques
Document we find:

"Test procedures are provided in techniques to help verify that the
technique has been properly implemented."

And:

"In particular, test procedures for individual techniques should not be
taken as test procedures for the WCAG 2.0 success criteria overall."

Best

Kerstin



> -----Ursprüngliche Nachricht-----
> Von: Alistair Garrison [mailto:alistair.j.garrison@gmail.com]
> Gesendet: Samstag, 14. Januar 2012 14:45
> An: Eval TF
> Betreff: Re: EvalTF discussion 5.5 and actual evaluation
> 
> Dear All,
> 
> To my mind there are no massively different ways to evaluate the WCAG
> 2.0 guidelines - seemingly, intentionally so.  We also don't need to
> take one of the WCAG 2.0 checkpoints and determine a way to assess it -
> as this has already been done for us.
> 
> From WCAG 2.0 it seems reasonably clear that you (in some way)
> determine which techniques are applicable to the content in the pages
> you want to assess, then you simply follow the Test Procedures
> prescribed in each of the applicable techniques. It does not matter if
> you do this one by one, per theme, per technology etc... that is surely
> up to whatever you think is best at the time.
> 
> Again, I'm a little concerned that we might be wandering towards
> recreating test procedures for individual techniques, when as mentioned
> that part has already been done by the WCAG 2.0 techniques working
> group. Isn't it the higher level question of how to approach the
> evaluation of a website (or conformance claim), and capture results, in
> a systematic way that we need to be answering?
> 
> For example, an approach such as...
> 
> 1) Clearly define what you want to test - the WCAG 2.0 Conformance
> Claim (or in its absence our website scoping method)...
> 2) Determine which techniques are applicable - by looking through these
> pages and finding relevant content, marking techniques non-applicable
> if no applicable content can be found.
> 3) Running all relevant test procedures (defined in applicable
> techniques) against all applicable content (found in 2).
> 4) Finally recording pass, fail or non-applicable for each relevant
> technique, and then determining from this all passed, failed and non-
> applicable checkpoints / guidelines.  Noting that there are several
> techniques available for doing certain things.  (Note: this is another
> reason why we might use the Conformance claim as techniques which have
> been used will hopefully be recorded, rather than us having to assess
> all techniques for a certain thing, until one is passed).
> 
> Just my thoughts...
> 
> Very best regards
> 
> Alistair
> 
> On 14 Jan 2012, at 05:38, Vivienne CONWAY wrote:
> 
> > HI Richard and all TF
> > While I understand the need to look at the procedures from an overall
> perspective first, I agree with Richard that it may be time to try out
> a few idea for practical implementation.  It may be a good idea for us
> all to take one of the WCAG 2.0 checkpoints and determine a way to
> assess it.  However, I remember (think it might have been Detlev)
> proposed that we do this and it was decided that we wouldn't be dealing
> with each point individually.  Or did I misunderstand?
> >
> >
> > Regards
> >
> > Vivienne L. Conway, B.IT(Hons)
> > PhD Candidate & Sessional Lecturer, Edith Cowan University, Perth,
> W.A.
> > Director, Web Key IT Pty Ltd.
> > v.conway@ecu.edu.au<mailto:v.conway@ecu.edu.au>
> > v.conway@webkeyit.com<mailto:v.conway@webkeyit.com>
> > Mob: 0415 383 673
> >
> > This email is confidential and intended only for the use of the
> individual or entity named above. If you are not the intended
> recipient, you are notified that any dissemination, distribution or
> copying of this email is strictly prohibited. If you have received this
> email in error, please notify me immediately by return email or
> telephone and destroy the original message.
> >
> > ________________________________
> > From: RichardWarren [richard.warren@userite.com]
> > Sent: Saturday, 14 January 2012 10:32 AM
> > To: Eval TF
> > Subject: Re: EvalTF discussion 5.5 and actual evaluation
> >
> > Dear TF,
> >
> > I cannot help thinking that we would save a lot of time and
> discussion if we concentrated on procedures for evaluation (5.3) where
> we are going to try “ to propose different ways to evaluate the
> guidelines: one by one, per theme, per technology, etc” .  As we do
> that we will come across the various technologies (5.2) and possibly
> come up with a few acceptable ways of dealing with “occasional errors”
> etc. if and when relevant to a particular guideline. This approach may
> be more efficient than trying to define systemic and incidental errors
> in a non-specific guideline context.
> >
> > I wonder if now is the time to get to the core of our task and start
> working on actual procedures where we can discuss levels of compliance
> and any effect in a more narrow, targeted environment.
> >
> > Regards
> > Richard
> >
> >
> > From: Elle<mailto:nethermind@gmail.com>
> > Sent: Friday, January 13, 2012 11:35 PM
> > To: Vivienne CONWAY<mailto:v.conway@ecu.edu.au>
> > Cc: Alistair Garrison<mailto:alistair.j.garrison@gmail.com> ; Shadi
> Abou-Zahra<mailto:shadi@w3.org> ; Eval TF<mailto:public-wai-
> evaltf@w3.org> ; Eric Velleman<mailto:evelleman@bartimeus.nl>
> > Subject: Re: EvalTF discussion 5.5
> >
> > TF:
> >
> > I have been reading the email discussions with avid interest and very
> little ability to add anything valuable yet.  My point of view seems to
> be very different from most in the group, as my job is to meet and
> maintain this conformance at a large organization. I'm learning quite a
> bit from all of you.
> >
> > I've been following this particular topic with a keen interest in
> seeing what a "margin of error" would be defined as, in part because
> our company is about to launch into a major site consolidation and I'm
> curious about how to scale our current testing process.  Until now,
> we've actually been testing every page we can with both automated scans
> and manual audits.
> >
> >> From a purely layman's point of view, the only confidence I have
> when testing medium to large volume websites (greater than 500 pages)
> is by doing the following:
> >
> > 1. automated scans of every single page
> > 2. manual accessibility testing modeled after the user acceptance
> test cases to test the critical user paths as defined by the business
> > 3. manual accessibility testing of each page type and/or widget or
> component (templates, in other words)
> >
> > So, I felt the need to chime in on "margin of error," because it
> worries me when we start quantifying a percentage of error. I see this
> from the corporate side.  Putting a percentage on this may actually
> undermine the overall success of accessibility specialists working
> inside of a large organization.  We may find ourselves with more
> technical compliance and less overall usability for disabled users. As
> for me, I need to be able to point to an evaluation technique that
> encompasses more than a codified measurement in my assessment of a
> website's conformance.  Ideally, the  really needs to account for user
> experience.  It's one of the fail safes in the current 508 Compliance
> requirements that I've taken shelter in, actually, as outdated as they
> are - functional performance criteria.
> >
> > I really appreciate the work everyone in this group is doing, as I
> will likely be a direct recipient of the outcome as I put these
> concepts into action over the course of their creation.  Consider me
> the intern who will try to see if these dogs will hunt. :)
> >
> >
> > Much appreciated,
> > Elle
> >
> >
> > On Thu, Jan 12, 2012 at 8:10 PM, Vivienne CONWAY
> <v.conway@ecu.edu.au<mailto:v.conway@ecu.edu.au>> wrote:
> > Hi Alistair and TF
> > You have raised an interesting point here.  I'm thinking I like your
> idea better than the 'margin of error' concept.  It removes the
> obstacle of trying to decide what constitutes an 'incidental' or
> 'systemic' error.  I thnk it's obvious that most of the time a website
> with systemic errors would not pass, unless it was sytem-wide and
> didn't pose any serious problem ie.a colour contrast that's .1 off the
> 4.5:1 rule.  I think I like the statement idea coupled with a
> comprehensive scope statement of what was tested.
> >
> >
> > Regards
> >
> > Vivienne L. Conway, B.IT<http://B.IT>(Hons)
> > PhD Candidate & Sessional Lecturer, Edith Cowan University, Perth,
> W.A.
> > Director, Web Key IT Pty Ltd.
> > v.conway@ecu.edu.au<mailto:v.conway@ecu.edu.au>
> > v.conway@webkeyit.com<mailto:v.conway@webkeyit.com>
> > Mob: 0415 383 673
> >
> > This email is confidential and intended only for the use of the
> individual or entity named above. If you are not the intended
> recipient, you are notified that any dissemination, distribution or
> copying of this email is strictly prohibited. If you have received this
> email in error, please notify me immediately by return email or
> telephone and destroy the original message.
> > ________________________________________
> > From: Alistair Garrison
> [alistair.j.garrison@gmail.com<mailto:alistair.j.garrison@gmail.com>]
> > Sent: Thursday, 12 January 2012 6:41 PM
> > To: Shadi Abou-Zahra; Eval TF; Eric Velleman
> > Subject: Re: EvalTF discussion 5.5
> >
> > Hi,
> >
> > The issue of "margin of error" relates to the size of the website and
> the number of pages actually being assessed.  I'm not so keen on the
> "5% incidental error" idea.
> >
> > If you assess 1 page from a 1 page website there should be no margin
> of error.
> > If you assess 10 pages from a 10 page website there should be no
> margin of error.
> > If you assess 10 pages from a 100 page website you will have
> certainty for 10 pages and uncertainty for 90.
> >
> > Instead of exploring the statistical complexities involved in trying
> to accurately define how uncertain we are (which could take a great
> deal of precious time) - could we not just introduce a simple
> disclaimer e.g.
> >
> > "The evaluator has tried their hardest to minimise the margin for
> error by actively looking for all content relevant to each technique
> being assessed which might have caused a fail."
> >
> > Food for thought...
> >
> > Alistair
> >
> > On 12 Jan 2012, at 10:04, Shadi Abou-Zahra wrote:
> >
> >> Hi Martijn, All,
> >>
> >> Good points but it sounds like we are speaking more of impact of
> errors rather than of the incidental vs systemic aspects of them.
> Intuitively one could say that an error that causes a barrier to
> completing a task on the web page needs to be weighted more
> significantly than an error that does not have the same impact, but it
> will be difficult to define what a "task" is. Maybe listing specific
> situations as you did is the way to go but I think we should not mix
> the two aspects together.
> >>
> >> Best,
> >> Shadi
> >>
> >>
> >> On 12.1.2012 09:41, Martijn Houtepen wrote:
> >>> Hi Eric, TF
> >>>
> >>> I would like to make a small expansion to your list, as follows:
> >>>
> >>> Errors can be incidental unless:
> >>>
> >>> a) it is a navigation element
> >>> b) the alt-attribute is necessary for the understanding of the
> information / interaction / essential to a key scenario or complete
> path
> >>> c) other impact related thoughts?
> >>> d) there is an alternative
> >>>
> >>> So an unlabeled (but required) field in a form (part of some key
> scenario) will be a systemic error.
> >>>
> >>> Martijn
> >>>
> >>> -----Oorspronkelijk bericht-----
> >>> Van: Velleman, Eric
> [mailto:evelleman@bartimeus.nl<mailto:evelleman@bartimeus.nl>]
> >>> Verzonden: woensdag 11 januari 2012 15:01
> >>> Aan: Boland Jr, Frederick E.
> >>> CC: Eval TF
> >>> Onderwerp: RE: EvalTF discussion 5.5
> >>>
> >>> Hi Frederick,
> >>>
> >>> Yes agree, but I think we can have both discussions at the same
> time. So:
> >>> 1. How do we define an error margin to cover non-structuraal
> errors?
> >>> 2. How can an evaluator determine the impact of an error?
> >>>
> >>> I could imagine we make a distinction between structural and
> incidental errors. The 1 failed alt-attribute out of 100 correct ones
> would be incidental... unless (and there comes the impact):
> >>>  a) it is a navigation element
> >>>  b) the alt-attribute is necessary for the understanding of the
> information / interaction
> >>>  c) other impact related thoughts?
> >>>  d) there is an alternative
> >>>
> >>> We could set the acceptance rate for incidental errors. Example:
> the site would be totally conformant, but with statement that for alt-
> attributes, there are 5% incidental fails.
> >>> This also directly relates to conformance in WCAG2.0 specifically
> section 5 Non-interference.
> >>>
> >>> Eric
> >>>
> >>>
> >>>
> >>> ________________________________________
> >>> Van: Boland Jr, Frederick E.
> [frederick.boland@nist.gov<mailto:frederick.boland@nist.gov>]
> >>> Verzonden: woensdag 11 januari 2012 14:32
> >>> Aan: Velleman, Eric
> >>> CC: Eval TF
> >>> Onderwerp: RE: EvalTF discussion 5.5
> >>>
> >>> As a preamble to this discussion, I think we need to define more
> precisely ("measure"?) what an "impact" would be (for example, impact
> to whom/what and what specifically are the consequences of said
> impact)?
> >>>
> >>> Thanks Tim
> >>>
> >>> -----Original Message-----
> >>> From: Velleman, Eric
> [mailto:evelleman@bartimeus.nl<mailto:evelleman@bartimeus.nl>]
> >>> Sent: Wednesday, January 11, 2012 4:15 AM
> >>> To: public-wai-evaltf@w3.org<mailto:public-wai-evaltf@w3.org>
> >>> Subject: EvalTF discussion 5.5
> >>>
> >>> Dear all,
> >>>
> >>> I would very much like to discuss section 5.5 about Error Margin.
> >>>
> >>> If one out of 1 million images on a website fails the alt-attribute
> this could mean that the complete websites scores a fail even if the
> "impact" would be very low. How do we define an error margin to cover
> these non-structural errors that have a low impact. This is already
> partly covered inside WCAG 2.0. But input and discussion would be
> great.
> >>>
> >>> Please share your thoughts.
> >>> Kindest regards,
> >>>
> >>> Eric
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>
> >> --
> >> Shadi Abou-Zahra - http://www.w3.org/People/shadi/
> >> Activity Lead, W3C/WAI International Program Office
> >> Evaluation and Repair Tools Working Group (ERT WG)
> >> Research and Development Working Group (RDWG)
> >>
> >
> > This e-mail is confidential. If you are not the intended recipient
> you must not disclose or use the information contained within. If you
> have received it in error please return it to the sender via reply e-
> mail and delete any record of it from your system. The information
> contained within is not the opinion of Edith Cowan University in
> general and the University accepts no liability for the accuracy of the
> information provided.
> >
> > CRICOS IPC 00279B
> >
> >
> >
> >
> > --
> > If you want to build a ship, don't drum up the people to gather wood,
> divide the work, and give orders. Instead, teach them to yearn for the
> vast and endless sea.
> > - Antoine De Saint-Exupéry, The Little Prince
> >
> >
> > ________________________________
> > This e-mail is confidential. If you are not the intended recipient
> you must not disclose or use the information contained within. If you
> have received it in error please return it to the sender via reply e-
> mail and delete any record of it from your system. The information
> contained within is not the opinion of Edith Cowan University in
> general and the University accepts no liability for the accuracy of the
> information provided.
> >
> > CRICOS IPC 00279B
> >
Received on Monday, 16 January 2012 08:12:58 UTC