RE: some initial questions from the previous thread from Kathleen Wahlbin on 2011-08-24 (public-wai-evaltf@w3.org from August 2011)

From: Kathleen Wahlbin <kathy@interactiveaccessibility.com>
Date: Wed, 24 Aug 2011 06:29:49 -0400
To: "'Eval TF'" <public-wai-evaltf@w3.org>
Message-ID: <010a01cc6248$c16d39b0$4447ad10$@interactiveaccessibility.com>
Hi - 

Looking at the questions from Shadi, here are my thoughts:

* Is the "evaluation methodology" expected to be carried out by one person
or by a group of more than one persons?

I think we need to account for both situations.  There are a lot of
situations that there is one person is conducting the testing and there are
others where there are multiple people.

When there are multiple people testing, there are usually different tasks
assigned.  For example, some of the tasks could be:
- Run the automated checker.  Review results and throw out those that do not
make sense
- Code level review
- Testing with different assistive devices
- Usability testing with different disability types

* What is the expected level of expertise (in accessibility, in web
technologies etc) of persons carrying out an evaluation?

This depends on the role that the person plays while carrying out the
evaluation.  In the situation where there are multiple people carrying out
the review, the level of expertise varies.  I agree with Vivienne, at a
minimum they need to have a basic training in HTML and accessibility.

* Is the involvement of people with disabilities a necessary part of
carrying out an evaluation versus an improvement of the quality?

I agree with Vivienne, user testing always shows additional issues that can
be fixed.  But like all usability testing, I feel that this is really an
improvement of the quality and a check to make sure people with disabilities
can use the site.  People with disabilities will have varying levels of
success completing tasks depending on their knowledge of the AT that they
are using.  Any results from usability tests would need to be evaluated with
this in mind.

* Are the individual test results binary (ie pass/fail) or a score (discrete
value, ratio, etc)?

There are definitely areas where it is a pass/fail situation but a lot of
the criteria is more of a judgment call.   In my reviews, I score issues
High, Med and Low but have not assigned a discrete value.

* How are these test results aggregated into an overall score (plain count,
weighted count, heuristics, etc)?

I have aggregated the results into a summary table of the overall issues.
When looking at the aggregated results, I take into account the target
audience and rate the issues based on the likelihood that a person with a
disability will be using it.  I have used High, Med, Low as a score not a
value.  For example, some websites require sight and would not be used by a
person with a visual disability.


Kathy

Phone:  978.760.0682
Fax:  978.560.1251
Kathy@InteractiveAccessibility.com

   

-----Original Message-----
From: public-wai-evaltf-request@w3.org
[mailto:public-wai-evaltf-request@w3.org] On Behalf Of Vivienne CONWAY
Sent: Tuesday, August 23, 2011 8:55 PM
To: Shadi Abou-Zahra; Eval TF
Subject: RE: some initial questions from the previous thread

Hi all

Looking at Shadi's initial questions

I think our methodology might be expressed to cover a variety of situations.
As some of the other responders mentioned, we often combine individual
testing, multiple expert testers, and user-group testing.  I think our
methodology statements will need to address the different situations where
testing occurs.

Expertise:  From what I've read in the literature, you need at least a basic
training in accessibility to be able to test a website.  I have had
individuals look at a website with no training just to see what they can
pick up.  They normally look at colours, size of text, and even sometimes
alternative text for images.  Some of the important things such as captions,
structure, skip navigation links etc. aren't obvious to them until we
provide some training.  After training, their eyes are opened (so to speak)
to the issues and requirements.  They also need to be training in the WCAG
2.0 principles.

User testing:  I am of the opinion that while we can test a website
ourselves, using people with disabilities in the testing sheds more light on
the importance of the different WCAG principles.  Sometimes some of the
things I think are important do not cause certain users any problems.  I am
finding that using a team with a variety of disabilities helps enormously.

Scoring: I am currently really unsure about the best way to do the scoring.
Up until now I've been using a pass/fail type of scoring.  However with my
research project, I need to develop some kind over percentage score for each
POUR principle and then aggregated into an overall score.  I need to find
out how to weight the different items as it would seem that not every item
should have the same weight.

Confidence score: not sure how this is meant.

Sorry for the length of reply - it requires lots of thought.

Regards

Vivienne L. Conway
________________________________________
From: public-wai-evaltf-request@w3.org [public-wai-evaltf-request@w3.org] On
Behalf Of Shadi Abou-Zahra [shadi@w3.org]
Sent: Monday, 22 August 2011 7:34 PM
To: Eval TF
Subject: some initial questions from the previous thread

Dear Eval TF,

 From the recent thread on the construction of WCAG 2.0 Techniques, here are
some questions to think about:

* Is the "evaluation methodology" expected to be carried out by one person
or by a group of more than one persons?

* What is the expected level of expertise (in accessibility, in web
technologies etc) of persons carrying out an evaluation?

* Is the involvement of people with disabilities a necessary part of
carrying out an evaluation versus an improvement of the quality?

* Are the individual test results binary (ie pass/fail) or a score (discrete
value, ratio, etc)?

* How are these test results aggregated into an overall score (plain count,
weighted count, heuristics, etc)?

* Is it useful to have a "confidence score" for the tests (for example
depending on the degree of subjectivity or "difficulty")?

* Is it useful to have a "confidence score" for the aggregated result
(depending on how the evaluation is carried out)?


Feel free to chime in if you have particular thoughts on any of these.

Best,
   Shadi

--
Shadi Abou-Zahra - http://www.w3.org/People/shadi/ Activity Lead, W3C/WAI
International Program Office Evaluation and Repair Tools Working Group (ERT
WG) Research and Development Working Group (RDWG)

This e-mail is confidential. If you are not the intended recipient you must
not disclose or use the information contained within. If you have received
it in error please return it to the sender via reply e-mail and delete any
record of it from your system. The information contained within is not the
opinion of Edith Cowan University in general and the University accepts no
liability for the accuracy of the information provided.

CRICOS IPC 00279B
Received on Wednesday, 24 August 2011 12:33:52 UTC