Re: EvalTF discussion 5.5 and actual evaluation from RichardWarren on 2012-01-15 (public-wai-evaltf@w3.org from January 2012)

From: RichardWarren <richard.warren@userite.com>
Date: Sun, 15 Jan 2012 02:11:23 -0000
To: "Alistair Garrison" <alistair.j.garrison@gmail.com>, "Eval TF" <public-wai-evaltf@w3.org>
Message-ID: <40F6EF3C877D4A5C936E6EDB9C2810ED@DaddyPC>
Dear Alistair,

You are quite right that we are not about designing techniques, however, 
according to section 5.3 we are about designing methodologies

5.3 This subclause provides a step by step description of the evaluation of 
the website (sample). This does not include going into the guidelines, 
success criteria etc from WCAG 2.0 or related techniques. It could be 
possible to propose different ways to evaluate the guidelines: one by one, 
per theme, per technology, etc. The Methodology will not prescribe one of 
those ways as necessary.

My feeling is that if we start work on this area now we will actually answer 
some of the questions being raised (and probably raise a few new ones). We 
will almost certainly identify synergies between some evaluating procedures 
which will save time and effort

regards

Richard


-----Original Message----- 
From: Alistair Garrison
Sent: Saturday, January 14, 2012 1:45 PM
To: Eval TF
Subject: Re: EvalTF discussion 5.5 and actual evaluation

Dear All,

To my mind there are no massively different ways to evaluate the WCAG 2.0 
guidelines - seemingly, intentionally so.  We also don't need to take one of 
the WCAG 2.0 checkpoints and determine a way to assess it - as this has 
already been done for us.

>From WCAG 2.0 it seems reasonably clear that you (in some way) determine 
which techniques are applicable to the content in the pages you want to 
assess, then you simply follow the Test Procedures prescribed in each of the 
applicable techniques. It does not matter if you do this one by one, per 
theme, per technology etc... that is surely up to whatever you think is best 
at the time.

Again, I'm a little concerned that we might be wandering towards recreating 
test procedures for individual techniques, when as mentioned that part has 
already been done by the WCAG 2.0 techniques working group. Isn't it the 
higher level question of how to approach the evaluation of a website (or 
conformance claim), and capture results, in a systematic way that we need to 
be answering?

For example, an approach such as...

1) Clearly define what you want to test - the WCAG 2.0 Conformance Claim (or 
in its absence our website scoping method)...
2) Determine which techniques are applicable - by looking through these 
pages and finding relevant content, marking techniques non-applicable if no 
applicable content can be found.
3) Running all relevant test procedures (defined in applicable techniques) 
against all applicable content (found in 2).
4) Finally recording pass, fail or non-applicable for each relevant 
technique, and then determining from this all passed, failed and 
non-applicable checkpoints / guidelines.  Noting that there are several 
techniques available for doing certain things.  (Note: this is another 
reason why we might use the Conformance claim as techniques which have been 
used will hopefully be recorded, rather than us having to assess all 
techniques for a certain thing, until one is passed).

Just my thoughts...

Very best regards

Alistair

On 14 Jan 2012, at 05:38, Vivienne CONWAY wrote:

> HI Richard and all TF
> While I understand the need to look at the procedures from an overall 
> perspective first, I agree with Richard that it may be time to try out a 
> few idea for practical implementation.  It may be a good idea for us all 
> to take one of the WCAG 2.0 checkpoints and determine a way to assess it. 
> However, I remember (think it might have been Detlev) proposed that we do 
> this and it was decided that we wouldn't be dealing with each point 
> individually.  Or did I misunderstand?
>
>
> Regards
>
> Vivienne L. Conway, B.IT(Hons)
> PhD Candidate & Sessional Lecturer, Edith Cowan University, Perth, W.A.
> Director, Web Key IT Pty Ltd.
> v.conway@ecu.edu.au<mailto:v.conway@ecu.edu.au>
> v.conway@webkeyit.com<mailto:v.conway@webkeyit.com>
> Mob: 0415 383 673
>
> This email is confidential and intended only for the use of the individual 
> or entity named above. If you are not the intended recipient, you are 
> notified that any dissemination, distribution or copying of this email is 
> strictly prohibited. If you have received this email in error, please 
> notify me immediately by return email or telephone and destroy the 
> original message.
>
> ________________________________
> From: RichardWarren [richard.warren@userite.com]
> Sent: Saturday, 14 January 2012 10:32 AM
> To: Eval TF
> Subject: Re: EvalTF discussion 5.5 and actual evaluation
>
> Dear TF,
>
> I cannot help thinking that we would save a lot of time and discussion if 
> we concentrated on procedures for evaluation (5.3) where we are going to 
> try “ to propose different ways to evaluate the guidelines: one by one, 
> per theme, per technology, etc” .  As we do that we will come across the 
> various technologies (5.2) and possibly come up with a few acceptable ways 
> of dealing with “occasional errors” etc. if and when relevant to a 
> particular guideline. This approach may be more efficient than trying to 
> define systemic and incidental errors in a non-specific guideline context.
>
> I wonder if now is the time to get to the core of our task and start 
> working on actual procedures where we can discuss levels of compliance and 
> any effect in a more narrow, targeted environment.
>
> Regards
> Richard
>
>
> From: Elle<mailto:nethermind@gmail.com>
> Sent: Friday, January 13, 2012 11:35 PM
> To: Vivienne CONWAY<mailto:v.conway@ecu.edu.au>
> Cc: Alistair Garrison<mailto:alistair.j.garrison@gmail.com> ; Shadi 
> Abou-Zahra<mailto:shadi@w3.org> ; Eval TF<mailto:public-wai-evaltf@w3.org> 
> ; Eric Velleman<mailto:evelleman@bartimeus.nl>
> Subject: Re: EvalTF discussion 5.5
>
> TF:
>
> I have been reading the email discussions with avid interest and very 
> little ability to add anything valuable yet.  My point of view seems to be 
> very different from most in the group, as my job is to meet and maintain 
> this conformance at a large organization. I'm learning quite a bit from 
> all of you.
>
> I've been following this particular topic with a keen interest in seeing 
> what a "margin of error" would be defined as, in part because our company 
> is about to launch into a major site consolidation and I'm curious about 
> how to scale our current testing process.  Until now, we've actually been 
> testing every page we can with both automated scans and manual audits.
>
>> From a purely layman's point of view, the only confidence I have when 
>> testing medium to large volume websites (greater than 500 pages) is by 
>> doing the following:
>
> 1. automated scans of every single page
> 2. manual accessibility testing modeled after the user acceptance test 
> cases to test the critical user paths as defined by the business
> 3. manual accessibility testing of each page type and/or widget or 
> component (templates, in other words)
>
> So, I felt the need to chime in on "margin of error," because it worries 
> me when we start quantifying a percentage of error. I see this from the 
> corporate side.  Putting a percentage on this may actually undermine the 
> overall success of accessibility specialists working inside of a large 
> organization.  We may find ourselves with more technical compliance and 
> less overall usability for disabled users. As for me, I need to be able to 
> point to an evaluation technique that encompasses more than a codified 
> measurement in my assessment of a website's conformance.  Ideally, the 
> really needs to account for user experience.  It's one of the fail safes 
> in the current 508 Compliance requirements that I've taken shelter in, 
> actually, as outdated as they are - functional performance criteria.
>
> I really appreciate the work everyone in this group is doing, as I will 
> likely be a direct recipient of the outcome as I put these concepts into 
> action over the course of their creation.  Consider me the intern who will 
> try to see if these dogs will hunt. :)
>
>
> Much appreciated,
> Elle
>
>
> On Thu, Jan 12, 2012 at 8:10 PM, Vivienne CONWAY 
> <v.conway@ecu.edu.au<mailto:v.conway@ecu.edu.au>> wrote:
> Hi Alistair and TF
> You have raised an interesting point here.  I'm thinking I like your idea 
> better than the 'margin of error' concept.  It removes the obstacle of 
> trying to decide what constitutes an 'incidental' or 'systemic' error.  I 
> thnk it's obvious that most of the time a website with systemic errors 
> would not pass, unless it was sytem-wide and didn't pose any serious 
> problem ie.a colour contrast that's .1 off the 4.5:1 rule.  I think I like 
> the statement idea coupled with a comprehensive scope statement of what 
> was tested.
>
>
> Regards
>
> Vivienne L. Conway, B.IT<http://B.IT>(Hons)
> PhD Candidate & Sessional Lecturer, Edith Cowan University, Perth, W.A.
> Director, Web Key IT Pty Ltd.
> v.conway@ecu.edu.au<mailto:v.conway@ecu.edu.au>
> v.conway@webkeyit.com<mailto:v.conway@webkeyit.com>
> Mob: 0415 383 673
>
> This email is confidential and intended only for the use of the individual 
> or entity named above. If you are not the intended recipient, you are 
> notified that any dissemination, distribution or copying of this email is 
> strictly prohibited. If you have received this email in error, please 
> notify me immediately by return email or telephone and destroy the 
> original message.
> ________________________________________
> From: Alistair Garrison 
> [alistair.j.garrison@gmail.com<mailto:alistair.j.garrison@gmail.com>]
> Sent: Thursday, 12 January 2012 6:41 PM
> To: Shadi Abou-Zahra; Eval TF; Eric Velleman
> Subject: Re: EvalTF discussion 5.5
>
> Hi,
>
> The issue of "margin of error" relates to the size of the website and the 
> number of pages actually being assessed.  I'm not so keen on the "5% 
> incidental error" idea.
>
> If you assess 1 page from a 1 page website there should be no margin of 
> error.
> If you assess 10 pages from a 10 page website there should be no margin of 
> error.
> If you assess 10 pages from a 100 page website you will have certainty for 
> 10 pages and uncertainty for 90.
>
> Instead of exploring the statistical complexities involved in trying to 
> accurately define how uncertain we are (which could take a great deal of 
> precious time) - could we not just introduce a simple disclaimer e.g.
>
> "The evaluator has tried their hardest to minimise the margin for error by 
> actively looking for all content relevant to each technique being assessed 
> which might have caused a fail."
>
> Food for thought...
>
> Alistair
>
> On 12 Jan 2012, at 10:04, Shadi Abou-Zahra wrote:
>
>> Hi Martijn, All,
>>
>> Good points but it sounds like we are speaking more of impact of errors 
>> rather than of the incidental vs systemic aspects of them. Intuitively 
>> one could say that an error that causes a barrier to completing a task on 
>> the web page needs to be weighted more significantly than an error that 
>> does not have the same impact, but it will be difficult to define what a 
>> "task" is. Maybe listing specific situations as you did is the way to go 
>> but I think we should not mix the two aspects together.
>>
>> Best,
>> Shadi
>>
>>
>> On 12.1.2012 09:41, Martijn Houtepen wrote:
>>> Hi Eric, TF
>>>
>>> I would like to make a small expansion to your list, as follows:
>>>
>>> Errors can be incidental unless:
>>>
>>> a) it is a navigation element
>>> b) the alt-attribute is necessary for the understanding of the 
>>> information / interaction / essential to a key scenario or complete path
>>> c) other impact related thoughts?
>>> d) there is an alternative
>>>
>>> So an unlabeled (but required) field in a form (part of some key 
>>> scenario) will be a systemic error.
>>>
>>> Martijn
>>>
>>> -----Oorspronkelijk bericht-----
>>> Van: Velleman, Eric 
>>> [mailto:evelleman@bartimeus.nl<mailto:evelleman@bartimeus.nl>]
>>> Verzonden: woensdag 11 januari 2012 15:01
>>> Aan: Boland Jr, Frederick E.
>>> CC: Eval TF
>>> Onderwerp: RE: EvalTF discussion 5.5
>>>
>>> Hi Frederick,
>>>
>>> Yes agree, but I think we can have both discussions at the same time. 
>>> So:
>>> 1. How do we define an error margin to cover non-structuraal errors?
>>> 2. How can an evaluator determine the impact of an error?
>>>
>>> I could imagine we make a distinction between structural and incidental 
>>> errors. The 1 failed alt-attribute out of 100 correct ones would be 
>>> incidental... unless (and there comes the impact):
>>>  a) it is a navigation element
>>>  b) the alt-attribute is necessary for the understanding of the 
>>> information / interaction
>>>  c) other impact related thoughts?
>>>  d) there is an alternative
>>>
>>> We could set the acceptance rate for incidental errors. Example: the 
>>> site would be totally conformant, but with statement that for 
>>> alt-attributes, there are 5% incidental fails.
>>> This also directly relates to conformance in WCAG2.0 specifically 
>>> section 5 Non-interference.
>>>
>>> Eric
>>>
>>>
>>>
>>> ________________________________________
>>> Van: Boland Jr, Frederick E. 
>>> [frederick.boland@nist.gov<mailto:frederick.boland@nist.gov>]
>>> Verzonden: woensdag 11 januari 2012 14:32
>>> Aan: Velleman, Eric
>>> CC: Eval TF
>>> Onderwerp: RE: EvalTF discussion 5.5
>>>
>>> As a preamble to this discussion, I think we need to define more 
>>> precisely ("measure"?) what an "impact" would be (for example, impact to 
>>> whom/what and what specifically are the consequences of said impact)?
>>>
>>> Thanks Tim
>>>
>>> -----Original Message-----
>>> From: Velleman, Eric 
>>> [mailto:evelleman@bartimeus.nl<mailto:evelleman@bartimeus.nl>]
>>> Sent: Wednesday, January 11, 2012 4:15 AM
>>> To: public-wai-evaltf@w3.org<mailto:public-wai-evaltf@w3.org>
>>> Subject: EvalTF discussion 5.5
>>>
>>> Dear all,
>>>
>>> I would very much like to discuss section 5.5 about Error Margin.
>>>
>>> If one out of 1 million images on a website fails the alt-attribute this 
>>> could mean that the complete websites scores a fail even if the "impact" 
>>> would be very low. How do we define an error margin to cover these 
>>> non-structural errors that have a low impact. This is already partly 
>>> covered inside WCAG 2.0. But input and discussion would be great.
>>>
>>> Please share your thoughts.
>>> Kindest regards,
>>>
>>> Eric
>>>
>>>
>>>
>>>
>>>
>>>
>>
>> --
>> Shadi Abou-Zahra - http://www.w3.org/People/shadi/
>> Activity Lead, W3C/WAI International Program Office
>> Evaluation and Repair Tools Working Group (ERT WG)
>> Research and Development Working Group (RDWG)
>>
>
> This e-mail is confidential. If you are not the intended recipient you 
> must not disclose or use the information contained within. If you have 
> received it in error please return it to the sender via reply e-mail and 
> delete any record of it from your system. The information contained within 
> is not the opinion of Edith Cowan University in general and the University 
> accepts no liability for the accuracy of the information provided.
>
> CRICOS IPC 00279B
>
>
>
>
> --
> If you want to build a ship, don't drum up the people to gather wood, 
> divide the work, and give orders. Instead, teach them to yearn for the 
> vast and endless sea.
> - Antoine De Saint-Exupéry, The Little Prince
>
>
> ________________________________
> This e-mail is confidential. If you are not the intended recipient you 
> must not disclose or use the information contained within. If you have 
> received it in error please return it to the sender via reply e-mail and 
> delete any record of it from your system. The information contained within 
> is not the opinion of Edith Cowan University in general and the University 
> accepts no liability for the accuracy of the information provided.
>
> CRICOS IPC 00279B
>
Received on Sunday, 15 January 2012 02:12:28 UTC