Re: Comments on draft of EVAL - statistically relevant sample recommendations from Mike Elledge on 2013-12-09 (public-wai-evaltf@w3.org from December 2013)

From: Mike Elledge <melledge@yahoo.com>
Date: Mon, 9 Dec 2013 08:49:28 -0800 (PST)
To: David MacDonald <david100@sympatico.ca>, 'Detlev Fischer' <detlev.fischer@testkreis.de>, "shadi@w3.org" <shadi@w3.org>
Cc: 'Eval TF' <public-wai-evaltf@w3.org>, "kirsten@can-adapt.com" <kirsten@can-adapt.com>
Message-ID: <1386607768.92816.YahooMailNeo@web162603.mail.bf1.yahoo.com>
Hi All--

I'm not a statistician per se, but it seems to me that there is so much variety from page to page within a website that it would be extremely time-consuming, if not impossible, to define statistical significance. I believe one would have to categorize elements (headings, tables, images, etc.) present in pages, and then determine the number of elements that would have to be reviewed to achieve a 95 percent confidence level. Identifying the appropriate number of pages for review would require categorizing pages according to the elements they contain and establishing the number that would give you a 95% confidence level that they were representative of each group.

I think the logic here is correct, please correct me if I'm wrong (Sarah?), but in any event I don't think it can be a simple matter of "There are x number of pages in the website, therefore we have to review y% of them."

Mike





On Monday, December 9, 2013 11:10 AM, David MacDonald <david100@sympatico.ca> wrote:
 
>> WCAG-EM makes it clear that no conformance claim for the site can be made based on the evaluation - the only claim that can be made is for the individual pages sampled.

I based my comment on Section 5. Provide a statement
<snip>...    Conformance level satisfied: Level A, AA or AAA as per Step 1.b. Define the Conformance Target;...</snip>

To me this is a clear statement that a level is claimed to be satisfied. If this is contradicted elsewhere in the document, then I think it needs to be addressed. I think my suggestion addresses it.

"Conformance Level (Level A, AA, AAA) is claimed with a fair degree of confidence, based on the WCAG Evaluation Methodology Framework" with a link to the EM document. In addition include another bullet which is the Contact information to report any accessibility issues.

> I think we should think twice about the effect and risk of recommending statistically relevant sample sizes.

I think we need to provide some sort of guidance on sample sizes. Shadi asked the source of my figures. A key researcher from Statistics Canada, the national organization responsible for all Statistical information in Canada met with the head of the CIO for the Government of Canada and provided a table of standard statistical sample sized. I could get full references if desired, as to the source etc... 

However, Detlev has a good point regarding the evaluation not being a random sample but rather it is targeted sample based on specific information. That is true for many of the bullets in that section. My point is that we could start with this number for the "SIZE" bullet, for which it is true, and then reduce sample size based on the non-random issues that Detlev brought up. For small sites with few templates I agree with Detlev. However, for large sites of over 5000, I think a sample size of 68 is not too much to ask.

Having said that, I am open to what others think...  I think having some sort of baseline at least in the form of examples is important. Otherwise there is no way to compare results with other organizations, and previpus evaluations, which I would think is a key objective of the methodology.

Cheers,
David MacDonald

CanAdapt Solutions Inc.
Tel:  613.235.4902
http://ca.linkedin.com/in/davidmacdonald100 
www.Can-Adapt.com 
  
  Adapting the web to all users
            Including those with disabilities



-----Original Message-----
From: Detlev Fischer [mailto:detlev.fischer@testkreis.de] 
Sent: December 9, 2013 10:13 AM
To: David MacDonald
Cc: Eval TF
Subject: Re: Comments on draft of EVAL - statistically relevant sample recommendations

Hi David, 

I think we should think twice about the effect and risk of recommending statistically relevant sample sizes. It is clear that a useful sample size depends a lot more on the number of templates and content types included than on the absolute number of pages. What kind of site is this baseline referring to? Many modern sites with just few templates can be tested very effectively with fewer than 10 pages. (I am ready to admit that increasing the sample size will occasionally bring up additional issues, but the extra output seems to follow the law of diminishing returns.)

More importantly perhaps, WCAG-EM makes it clear that no conformance claim for the site can be made based on the evaluation - the only claim that can be made is for the individual pages sampled. 

Including a sample size suggestion that starts with a whooping 32 pages for small sites to go up to 68 pages for large sites seems quite unrealistic at least in the German context of accessibility evaluation (both in view of the time and effort needed at the prices customers will be willing to accept). If sample recommendations are read as a 'must' this might invite the rejection of WCAG EM by practitioners testing with a much smaller sample.

Best,
Detlev

On 7 Dec 2013, at 06:14, David MacDonald wrote:

> Hi Folks
>  
> I posted these in my survey comments sheet but thought I should include them here. There are some simple typo fix suggestions and a few substantive change proposals...
>  
>  
> Typo in TOC
> Procceses --- Should be processes
> =======
> Intro
>  
> <snip>self-assessment and third-party evaluation</snip>
>  
> "self -assessment" seems like a one man organization... how about "internal self-assessment"
>  
> ==========
> typo
> distinctinstance
>  
> Spelling
> Constistent
>  
> Spelling 3e
> Nethods
>  
> ==========
> Representative Sample Step 3
>  
> There are no example baselines of the number of pages to sample. There is no ballpark and this could result in much variation across evaluators and jurisdictions. I think there are two ways to improve this and provide better guidance that will allow more consistent results across jurisdictions.
> 1) use the “size of website” criteria as baseline and provide a statistically relevant sample recommendations, such as those used by the Canadian Government in response to the Donna Jodhan Case.
>  
> Suggested replacement text:
> Size of the website — websites with more web pages typically require a larger sample to evaluate.
> <add> For example,the following is a statistically relevant sample size with a 90% confidence level, +/- 10% error. If the website has web pages numbering:
> ≤60, then a sample of 32
> <100 then a sample size of 47
> <200 then a sample size of 56
> <500 then a sample size of 60
> <1000 then a sample size of 64
> <5000 then a sample size of 67
> >5000 then a sample size of 68
> </add>
> These are established international statistical sample sizes. Then with that baseline we can talk about increasing (or decreasing) the sample size based on the other factors such as complexity, age, consistency etc...
>  
> =======
> There is some implicit mention early on about templates but they seem drop off in this important section where I think they should be included explicitly.
>  
> How about adding this to step 3:
> 3f templates. Choose a page using each type of template.
>  
> ====Section 4====
>  
> I think there is some ambiguity between baseline WCAG conformance and good usability/ best practices.
>  
> Although I almost always include people with disabilities in evaluations, and it often identifies things that can be improved on a web site's accessibility/usability, it rarely results in identifying strict WCAG failures that were not found in the "expert review". I think this sentence could be improved to correct the ambiguity.
>  
> <snip>"Involving people with disabilities and people with 
> aging-related impairments helps identify additional accessibility 
> barriers that are not easily discovered by the evaluators 
> alone."</snip>
>  
> Let's leave evaluators out of this sentence.
>  
> "Involving people with disabilities and people with aging-related impairments provides a clearer picture of how the site actually works for people with disabilities. It can result in a more rounded and useful assessment, and therefore better usability and overall accessibility of the site."
>  
> ===
> <snip> Note:... In such cases, an evaluator may use an identifier such 
> as "not applicable" to denote the particular situations where Success 
> Criteria are satisfied because no matching content is 
> presented.</snip>
>  
> We may want to check with Gregg about this, I think he felt pretty strongly about not having “N/A” on conformance claims, although I don’t personally have particular issue about it. I think we should listen to his rational.
>  
> ====Section 5 ====
> Conformance level satisfied: Level A, AA or AAA as per Step 1.b. 
> Define the Conformance Target;
>  
> I don't think an organization can claim absolute WCAG conformance based on this methodology, as this phrase appears to indicate. At least not as it is defined currently in WCAG which requires EVERY page to conform.  I think it might expose them to legal action.
> I think it should be reported like statistics are reported.
> "We report Conformance Level (Level A, AA, AAA) with a fair degree of confidence, based on the WCAG Evaluation Methodology Framework" with a link to the this document.
> The report should also include another bullet.
> -Contact information to report any accessibility issues on pages that may not have been evaluated.
>  
> =====
> Grammar
> Currently <add>comma</add> the following performance scoring approaches are provided by this methodology:
>  
> Cheers,
> David MacDonald
>  
> CanAdapt Solutions Inc.
> Tel:  613.235.4902
> http://ca.linkedin.com/in/davidmacdonald100
> www.Can-Adapt.com
>  
>   Adapting the web to all users
>             Including those with disabilities
>  
> This e-mail originates from CanAdapt Solutions Inc. Any distribution, use or copying of this e-mail or the information it contains by other than the intended recipient(s) is unauthorized. If you are not the intended recipient, please notify me at the telephone number shown above or by return e-mail and delete this communication and any copy immediately. Thank you.
>  
> Le présent courriel a été expédié par CanAdapt Solutions Inc. Toute distribution, utilisation ou reproduction du courriel ou des renseignements qui s'y trouvent par une personne autre que son destinataire prévu est interdite. Si vous avez reçu le message par erreur, veuillez m'en aviser par téléphone (au numéro précité) ou par courriel, puis supprimer sans délai la version originale de la communication ainsi que toutes ses copies. Je vous remercie de votre collaboration.
>  

--
Detlev Fischer
testkreis - das Accessibility-Team von feld.wald.wiese c/o feld.wald.wiese Thedestraße 2
22767 Hamburg

Tel   +49 (0)40 439 10 68-3
Mobil +49 (0)1577 170 73 84
Fax   +49 (0)40 439 10 68-5

http://www.testkreis.de
Beratung, Tests und Schulungen für barrierefreie Websites
Received on Monday, 9 December 2013 16:52:43 UTC