AW: AW: EvalTF discussion 5.5 and actual evaluation from Kerstin Probiesch on 2012-01-16 (public-wai-evaltf@w3.org from January 2012)

From: Kerstin Probiesch <k.probiesch@googlemail.com>
Date: Mon, 16 Jan 2012 11:57:09 +0100
To: "'Alistair Garrison'" <alistair.j.garrison@gmail.com>, "'Eval TF'" <public-wai-evaltf@w3.org>
Message-ID: <4f140262.812f0e0a.1bc3.13f0@mx.google.com>
Hi Alistair, all,

I'm really thinking about what the descriptions in the Intro of the
Techniques Document could mean for our evaluation methodology but also for
testing procedures in general as long as an evaluator is testing. Therefore
and the other mail I also raised up the question if web developers really
belong to the target audience of our methodology.

Our methodology will not describe testing procedures, because the work is
already done. Nevertheless we have to say something about it. In the Intro
we find:

"Note that all techniques are informative." (Glossary: "for information
purposes and not required for conformance. Note: Content required for
conformance is referred to as "normative.")

So if all techniques are informative and we as a TF of the W3C recommend
testing (sufficient) techniques this could lead to the situation that
informative techniques might change it's character to quasi-normative. This
is for sure no problem when a client is asking: are the listed sufficient
techniques implemented correctly? But is it also no problem when asking and
speaking about conformance?

If a testing procedure is testing just sufficient techniques who then will
develop innovative and new techniques? They would not pass, or?. Just some
prophets of doom, instead of cents ;-)

Very Best

Kerstin


Von: Alistair Garrison [mailto:alistair.j.garrison@gmail.com] 
Gesendet: Montag, 16. Januar 2012 10:38
An: Kerstin Probiesch; Eval TF
Betreff: Re: AW: EvalTF discussion 5.5 and actual evaluation

Hi Kerstin, Eval TF,

We have no choice but to use the testing procedures defined in individual
techniques, when those techniques have been implemented.

Referencing http://www.w3.org/TR/WCAG-TECHS/intro.html...

In the section "Sufficient and Advisory Techniques"  - you will note, that
sufficient techniques exist for each Success Criterion that, if implemented
correctly, are 'sufficient' to meet that Success Criterion.

It is true that we cannot say using just these sufficient techniques is the
only way to meet success criteria - as a developer can show (how???) that
they have met a success criteria by using their own techniques (also
presents a problem for testing as the tester (or someone) would then have to
assume responsibility for saying these new techniques are sufficient)...  

And, as I mentioned in my previous email there are some occasions where two
or more techniques are provided for doing the same thing.  In this case if
we were to test both (as both are relevant), one might fail and one might
pass, if we only look at the passes and fails we would make the wrong
judgment for the success criteria - as only the implemented technique should
have been assessed.   

This is presumably why the Testing Techniques section of this document
contains the sentence "In particular, test procedures for individual
techniques should not be taken as test procedures for the WCAG 2.0 success
criteria overall." - although, it might have been amended to "test
procedures for all relevant individual techniques" for clarity.  
Especially, as the paragraph runs on:

"While the test procedure for a given technique may produce a fail result,
because the technique was not used, the success criterion may be met via
another technique. It is even possible that the success criterion is met via
a technique that is not documented in this collection, so failing test
procedures for all documented sufficient techniques may not mean that the
success criterion is not met."

However, if you concentrate solely on testing actually implemented
sufficient techniques (which due to reduced overheads I assume will be used
by the majority of people) the summation of their pass / fail /
non-applicable results will, by design, be 'sufficient' to state if a
success criteria is passed / failed / non-applicable.

You have most certainly raised several interesting questions...  Should we
concern ourselves with assessing only implemented sufficient techniques, or
should we take on the wider responsibility of verifying / assessing people's
custom techniques?  If we only concentrate on implemented sufficient
techniques - will the W3C provide a mechanism for people to submit custom
techniques, for verification, which they believe also fulfil success
criteria?

Very best regards 

Alistair 

On 16 Jan 2012, at 09:11, Kerstin Probiesch wrote:


Hi Alistair, all,

I think we should be very careful with any testing procedures which rely on
techniques. Techniques are mainly for developers/authors. In the Techniques
Document we find:

"Test procedures are provided in techniques to help verify that the
technique has been properly implemented."

And:

"In particular, test procedures for individual techniques should not be
taken as test procedures for the WCAG 2.0 success criteria overall."

Best

Kerstin




-----Ursprüngliche Nachricht-----
Von: Alistair Garrison [mailto:alistair.j.garrison@gmail.com]
Gesendet: Samstag, 14. Januar 2012 14:45
An: Eval TF
Betreff: Re: EvalTF discussion 5.5 and actual evaluation

Dear All,

To my mind there are no massively different ways to evaluate the WCAG
2.0 guidelines - seemingly, intentionally so.  We also don't need to
take one of the WCAG 2.0 checkpoints and determine a way to assess it -
as this has already been done for us.

>From WCAG 2.0 it seems reasonably clear that you (in some way)
determine which techniques are applicable to the content in the pages
you want to assess, then you simply follow the Test Procedures
prescribed in each of the applicable techniques. It does not matter if
you do this one by one, per theme, per technology etc... that is surely
up to whatever you think is best at the time.

Again, I'm a little concerned that we might be wandering towards
recreating test procedures for individual techniques, when as mentioned
that part has already been done by the WCAG 2.0 techniques working
group. Isn't it the higher level question of how to approach the
evaluation of a website (or conformance claim), and capture results, in
a systematic way that we need to be answering?

For example, an approach such as...

1) Clearly define what you want to test - the WCAG 2.0 Conformance
Claim (or in its absence our website scoping method)...
2) Determine which techniques are applicable - by looking through these
pages and finding relevant content, marking techniques non-applicable
if no applicable content can be found.
3) Running all relevant test procedures (defined in applicable
techniques) against all applicable content (found in 2).
4) Finally recording pass, fail or non-applicable for each relevant
technique, and then determining from this all passed, failed and non-
applicable checkpoints / guidelines.  Noting that there are several
techniques available for doing certain things.  (Note: this is another
reason why we might use the Conformance claim as techniques which have
been used will hopefully be recorded, rather than us having to assess
all techniques for a certain thing, until one is passed).

Just my thoughts...

Very best regards

Alistair

On 14 Jan 2012, at 05:38, Vivienne CONWAY wrote:

HI Richard and all TF
While I understand the need to look at the procedures from an overall
perspective first, I agree with Richard that it may be time to try out
a few idea for practical implementation.  It may be a good idea for us
all to take one of the WCAG 2.0 checkpoints and determine a way to
assess it.  However, I remember (think it might have been Detlev)
proposed that we do this and it was decided that we wouldn't be dealing
with each point individually.  Or did I misunderstand?


Regards

Vivienne L. Conway, B.IT(Hons)
PhD Candidate & Sessional Lecturer, Edith Cowan University, Perth,
W.A.
Director, Web Key IT Pty Ltd.
v.conway@ecu.edu.au<mailto:v.conway@ecu.edu.au>
v.conway@webkeyit.com<mailto:v.conway@webkeyit.com>
Mob: 0415 383 673

This email is confidential and intended only for the use of the
individual or entity named above. If you are not the intended
recipient, you are notified that any dissemination, distribution or
copying of this email is strictly prohibited. If you have received this
email in error, please notify me immediately by return email or
telephone and destroy the original message.

________________________________
From: RichardWarren [richard.warren@userite.com]
Sent: Saturday, 14 January 2012 10:32 AM
To: Eval TF
Subject: Re: EvalTF discussion 5.5 and actual evaluation

Dear TF,

I cannot help thinking that we would save a lot of time and
discussion if we concentrated on procedures for evaluation (5.3) where
we are going to try “ to propose different ways to evaluate the
guidelines: one by one, per theme, per technology, etc” .  As we do
that we will come across the various technologies (5.2) and possibly
come up with a few acceptable ways of dealing with “occasional errors”
etc. if and when relevant to a particular guideline. This approach may
be more efficient than trying to define systemic and incidental errors
in a non-specific guideline context.

I wonder if now is the time to get to the core of our task and start
working on actual procedures where we can discuss levels of compliance
and any effect in a more narrow, targeted environment.

Regards
Richard


From: Elle<mailto:nethermind@gmail.com>
Sent: Friday, January 13, 2012 11:35 PM
To: Vivienne CONWAY<mailto:v.conway@ecu.edu.au>
Cc: Alistair Garrison<mailto:alistair.j.garrison@gmail.com> ; Shadi
Abou-Zahra<mailto:shadi@w3.org> ; Eval TF<mailto:public-wai-
evaltf@w3.org> ; Eric Velleman<mailto:evelleman@bartimeus.nl>
Subject: Re: EvalTF discussion 5.5

TF:

I have been reading the email discussions with avid interest and very
little ability to add anything valuable yet.  My point of view seems to
be very different from most in the group, as my job is to meet and
maintain this conformance at a large organization. I'm learning quite a
bit from all of you.

I've been following this particular topic with a keen interest in
seeing what a "margin of error" would be defined as, in part because
our company is about to launch into a major site consolidation and I'm
curious about how to scale our current testing process.  Until now,
we've actually been testing every page we can with both automated scans
and manual audits.

>From a purely layman's point of view, the only confidence I have
when testing medium to large volume websites (greater than 500 pages)
is by doing the following:

1. automated scans of every single page
2. manual accessibility testing modeled after the user acceptance
test cases to test the critical user paths as defined by the business
3. manual accessibility testing of each page type and/or widget or
component (templates, in other words)

So, I felt the need to chime in on "margin of error," because it
worries me when we start quantifying a percentage of error. I see this
from the corporate side.  Putting a percentage on this may actually
undermine the overall success of accessibility specialists working
inside of a large organization.  We may find ourselves with more
technical compliance and less overall usability for disabled users. As
for me, I need to be able to point to an evaluation technique that
encompasses more than a codified measurement in my assessment of a
website's conformance.  Ideally, the  really needs to account for user
experience.  It's one of the fail safes in the current 508 Compliance
requirements that I've taken shelter in, actually, as outdated as they
are - functional performance criteria.

I really appreciate the work everyone in this group is doing, as I
will likely be a direct recipient of the outcome as I put these
concepts into action over the course of their creation.  Consider me
the intern who will try to see if these dogs will hunt. :)


Much appreciated,
Elle


On Thu, Jan 12, 2012 at 8:10 PM, Vivienne CONWAY
<v.conway@ecu.edu.au<mailto:v.conway@ecu.edu.au>> wrote:
Hi Alistair and TF
You have raised an interesting point here.  I'm thinking I like your
idea better than the 'margin of error' concept.  It removes the
obstacle of trying to decide what constitutes an 'incidental' or
'systemic' error.  I thnk it's obvious that most of the time a website
with systemic errors would not pass, unless it was sytem-wide and
didn't pose any serious problem ie.a colour contrast that's .1 off the
4.5:1 rule.  I think I like the statement idea coupled with a
comprehensive scope statement of what was tested.


Regards

Vivienne L. Conway, B.IT<http://B.IT>(Hons)
PhD Candidate & Sessional Lecturer, Edith Cowan University, Perth,
W.A.
Director, Web Key IT Pty Ltd.
v.conway@ecu.edu.au<mailto:v.conway@ecu.edu.au>
v.conway@webkeyit.com<mailto:v.conway@webkeyit.com>
Mob: 0415 383 673

This email is confidential and intended only for the use of the
individual or entity named above. If you are not the intended
recipient, you are notified that any dissemination, distribution or
copying of this email is strictly prohibited. If you have received this
email in error, please notify me immediately by return email or
telephone and destroy the original message.
________________________________________
From: Alistair Garrison
[alistair.j.garrison@gmail.com<mailto:alistair.j.garrison@gmail.com>]
Sent: Thursday, 12 January 2012 6:41 PM
To: Shadi Abou-Zahra; Eval TF; Eric Velleman
Subject: Re: EvalTF discussion 5.5

Hi,

The issue of "margin of error" relates to the size of the website and
the number of pages actually being assessed.  I'm not so keen on the
"5% incidental error" idea.

If you assess 1 page from a 1 page website there should be no margin
of error.
If you assess 10 pages from a 10 page website there should be no
margin of error.
If you assess 10 pages from a 100 page website you will have
certainty for 10 pages and uncertainty for 90.

Instead of exploring the statistical complexities involved in trying
to accurately define how uncertain we are (which could take a great
deal of precious time) - could we not just introduce a simple
disclaimer e.g.

"The evaluator has tried their hardest to minimise the margin for
error by actively looking for all content relevant to each technique
being assessed which might have caused a fail."

Food for thought...

Alistair

On 12 Jan 2012, at 10:04, Shadi Abou-Zahra wrote:

Hi Martijn, All,

Good points but it sounds like we are speaking more of impact of
errors rather than of the incidental vs systemic aspects of them.
Intuitively one could say that an error that causes a barrier to
completing a task on the web page needs to be weighted more
significantly than an error that does not have the same impact, but it
will be difficult to define what a "task" is. Maybe listing specific
situations as you did is the way to go but I think we should not mix
the two aspects together.

Best,
Shadi


On 12.1.2012 09:41, Martijn Houtepen wrote:
Hi Eric, TF

I would like to make a small expansion to your list, as follows:

Errors can be incidental unless:

a) it is a navigation element
b) the alt-attribute is necessary for the understanding of the
information / interaction / essential to a key scenario or complete
path
c) other impact related thoughts?
d) there is an alternative

So an unlabeled (but required) field in a form (part of some key
scenario) will be a systemic error.

Martijn

-----Oorspronkelijk bericht-----
Van: Velleman, Eric
[mailto:evelleman@bartimeus.nl<mailto:evelleman@bartimeus.nl>]
Verzonden: woensdag 11 januari 2012 15:01
Aan: Boland Jr, Frederick E.
CC: Eval TF
Onderwerp: RE: EvalTF discussion 5.5

Hi Frederick,

Yes agree, but I think we can have both discussions at the same
time. So:
1. How do we define an error margin to cover non-structuraal
errors?
2. How can an evaluator determine the impact of an error?

I could imagine we make a distinction between structural and
incidental errors. The 1 failed alt-attribute out of 100 correct ones
would be incidental... unless (and there comes the impact):
a) it is a navigation element
b) the alt-attribute is necessary for the understanding of the
information / interaction
c) other impact related thoughts?
d) there is an alternative

We could set the acceptance rate for incidental errors. Example:
the site would be totally conformant, but with statement that for alt-
attributes, there are 5% incidental fails.
This also directly relates to conformance in WCAG2.0 specifically
section 5 Non-interference.

Eric



________________________________________
Van: Boland Jr, Frederick E.
[frederick.boland@nist.gov<mailto:frederick.boland@nist.gov>]
Verzonden: woensdag 11 januari 2012 14:32
Aan: Velleman, Eric
CC: Eval TF
Onderwerp: RE: EvalTF discussion 5.5

As a preamble to this discussion, I think we need to define more
precisely ("measure"?) what an "impact" would be (for example, impact
to whom/what and what specifically are the consequences of said
impact)?

Thanks Tim

-----Original Message-----
From: Velleman, Eric
[mailto:evelleman@bartimeus.nl<mailto:evelleman@bartimeus.nl>]
Sent: Wednesday, January 11, 2012 4:15 AM
To: public-wai-evaltf@w3.org<mailto:public-wai-evaltf@w3.org>
Subject: EvalTF discussion 5.5

Dear all,

I would very much like to discuss section 5.5 about Error Margin.

If one out of 1 million images on a website fails the alt-attribute
this could mean that the complete websites scores a fail even if the
"impact" would be very low. How do we define an error margin to cover
these non-structural errors that have a low impact. This is already
partly covered inside WCAG 2.0. But input and discussion would be
great.

Please share your thoughts.
Kindest regards,

Eric







--
Shadi Abou-Zahra - http://www.w3.org/People/shadi/
Activity Lead, W3C/WAI International Program Office
Evaluation and Repair Tools Working Group (ERT WG)
Research and Development Working Group (RDWG)


This e-mail is confidential. If you are not the intended recipient
you must not disclose or use the information contained within. If you
have received it in error please return it to the sender via reply e-
mail and delete any record of it from your system. The information
contained within is not the opinion of Edith Cowan University in
general and the University accepts no liability for the accuracy of the
information provided.

CRICOS IPC 00279B




--
If you want to build a ship, don't drum up the people to gather wood,
divide the work, and give orders. Instead, teach them to yearn for the
vast and endless sea.
- Antoine De Saint-Exupéry, The Little Prince


________________________________
This e-mail is confidential. If you are not the intended recipient
you must not disclose or use the information contained within. If you
have received it in error please return it to the sender via reply e-
mail and delete any record of it from your system. The information
contained within is not the opinion of Edith Cowan University in
general and the University accepts no liability for the accuracy of the
information provided.

CRICOS IPC 00279B
Received on Monday, 16 January 2012 10:57:42 UTC