Re: Comments on draft of EVAL - statistically relevant sample recommendations from Shadi Abou-Zahra on 2013-12-09 (public-wai-evaltf@w3.org from December 2013)

From: Shadi Abou-Zahra <shadi@w3.org>
Date: Mon, 09 Dec 2013 16:33:08 +0100
To: Detlev Fischer <detlev.fischer@testkreis.de>, David MacDonald <david100@sympatico.ca>
CC: Eval TF <public-wai-evaltf@w3.org>
Message-ID: <52A5E2B4.6020505@w3.org>
Yes, we've had quite a bit of discussion on this.

But separately I'd like to know references for the claim:
  - "90% confidence level, ± 10% error"

Who said this and how were these figures established? We would need to 
be very comfortable with the source to just take it over as-is.

Thanks,
   Shadi


On 9.12.2013 16:13, Detlev Fischer wrote:
> Hi David,
>
> I think we should think twice about the effect and risk of recommending statistically relevant sample sizes. It is clear that a useful sample size depends a lot more on the number of templates and content types included than on the absolute number of pages. What kind of site is this baseline referring to? Many modern sites with just few templates can be tested very effectively with fewer than 10 pages. (I am ready to admit that increasing the sample size will occasionally bring up additional issues, but the extra output seems to follow the law of diminishing returns.)
>
> More importantly perhaps, WCAG-EM makes it clear that no conformance claim for the site can be made based on the evaluation - the only claim that can be made is for the individual pages sampled.
>
> Including a sample size suggestion that starts with a whooping 32 pages for small sites to go up to 68 pages for large sites seems quite unrealistic at least in the German context of accessibility evaluation (both in view of the time and effort needed at the prices customers will be willing to accept). If sample recommendations are read as a 'must' this might invite the rejection of WCAG EM by practitioners testing with a much smaller sample.
>
> Best,
> Detlev
>
> On 7 Dec 2013, at 06:14, David MacDonald wrote:
>
>> Hi Folks
>>
>> I posted these in my survey comments sheet but thought I should include them here. There are some simple typo fix suggestions and a few substantive change proposals...
>>
>>
>> Typo in TOC
>> Procceses --- Should be processes
>> =======
>> Intro
>>
>> <snip>self-assessment and third-party evaluation</snip>
>>
>> "self -assessment" seems like a one man organization... how about "internal self-assessment"
>>
>> ==========
>> typo
>> distinctinstance
>>
>> Spelling
>> Constistent
>>
>> Spelling 3e
>> Nethods
>>
>> ==========
>> Representative Sample Step 3
>>
>> There are no example baselines of the number of pages to sample. There is no ballpark and this could result in much variation across evaluators and jurisdictions. I think there are two ways to improve this and provide better guidance that will allow more consistent results across jurisdictions.
>> 1) use the “size of website” criteria as baseline and provide a statistically relevant sample recommendations, such as those used by the Canadian Government in response to the Donna Jodhan Case.
>>
>> Suggested replacement text:
>> Size of the website — websites with more web pages typically require a larger sample to evaluate.
>> <add> For example,the following is a statistically relevant sample size with a 90% confidence level, +/- 10% error. If the website has web pages numbering:
>> ≤60, then a sample of 32
>> <100 then a sample size of 47
>> <200 then a sample size of 56
>> <500 then a sample size of 60
>> <1000 then a sample size of 64
>> <5000 then a sample size of 67
>>> 5000 then a sample size of 68
>> </add>
>> These are established international statistical sample sizes. Then with that baseline we can talk about increasing (or decreasing) the sample size based on the other factors such as complexity, age, consistency etc...
>>
>> =======
>> There is some implicit mention early on about templates but they seem drop off in this important section where I think they should be included explicitly.
>>
>> How about adding this to step 3:
>> 3f templates. Choose a page using each type of template.
>>
>> ====Section 4====
>>
>> I think there is some ambiguity between baseline WCAG conformance and good usability/ best practices.
>>
>> Although I almost always include people with disabilities in evaluations, and it often identifies things that can be improved on a web site's accessibility/usability, it rarely results in identifying strict WCAG failures that were not found in the "expert review". I think this sentence could be improved to correct the ambiguity.
>>
>> <snip>"Involving people with disabilities and people with aging-related impairments helps identify additional accessibility barriers that are not easily discovered by the evaluators alone."</snip>
>>
>> Let's leave evaluators out of this sentence.
>>
>> "Involving people with disabilities and people with aging-related impairments provides a clearer picture of how the site actually works for people with disabilities. It can result in a more rounded and useful assessment, and therefore better usability and overall accessibility of the site."
>>
>> ===
>> <snip> Note:... In such cases, an evaluator may use an identifier such as "not applicable" to denote the particular situations where Success Criteria are satisfied because no matching content is presented.</snip>
>>
>> We may want to check with Gregg about this, I think he felt pretty strongly about not having “N/A” on conformance claims, although I don’t personally have particular issue about it. I think we should listen to his rational.
>>
>> ====Section 5 ====
>> Conformance level satisfied: Level A, AA or AAA as per Step 1.b. Define the Conformance Target;
>>
>> I don't think an organization can claim absolute WCAG conformance based on this methodology, as this phrase appears to indicate. At least not as it is defined currently in WCAG which requires EVERY page to conform.  I think it might expose them to legal action.
>> I think it should be reported like statistics are reported.
>> "We report Conformance Level (Level A, AA, AAA) with a fair degree of confidence, based on the WCAG Evaluation Methodology Framework" with a link to the this document.
>> The report should also include another bullet.
>> -Contact information to report any accessibility issues on pages that may not have been evaluated.
>>
>> =====
>> Grammar
>> Currently <add>comma</add> the following performance scoring approaches are provided by this methodology:
>>
>> Cheers,
>> David MacDonald
>>
>> CanAdapt Solutions Inc.
>> Tel:  613.235.4902
>> http://ca.linkedin.com/in/davidmacdonald100
>> www.Can-Adapt.com
>>
>>    Adapting the web to all users
>>              Including those with disabilities
>>
>> This e-mail originates from CanAdapt Solutions Inc. Any distribution, use or copying of this e-mail or the information it contains by other than the intended recipient(s) is unauthorized. If you are not the intended recipient, please notify me at the telephone number shown above or by return e-mail and delete this communication and any copy immediately. Thank you.
>>
>> Le présent courriel a été expédié par CanAdapt Solutions Inc. Toute distribution, utilisation ou reproduction du courriel ou des renseignements qui s'y trouvent par une personne autre que son destinataire prévu est interdite. Si vous avez reçu le message par erreur, veuillez m'en aviser par téléphone (au numéro précité) ou par courriel, puis supprimer sans délai la version originale de la communication ainsi que toutes ses copies. Je vous remercie de votre collaboration.
>>
>

-- 
Shadi Abou-Zahra - http://www.w3.org/People/shadi/
Activity Lead, W3C/WAI International Program Office
Evaluation and Repair Tools Working Group (ERT WG)
Research and Development Working Group (RDWG)
Received on Monday, 9 December 2013 15:33:44 UTC