Sampling methods from Vivienne CONWAY on 2013-01-26 (public-wai-evaltf@w3.org from January 2013)

From: Vivienne CONWAY <v.conway@ecu.edu.au>
Date: Sat, 26 Jan 2013 14:26:30 +0800
To: Shadi Abou-Zahra <shadi@w3.org>, Eval TF <public-wai-evaltf@w3.org>
Message-ID: <8AFA77741B11DB47B24131F1E38227A9FB77CA88AA@XCHG-MS1.ads.ecu.edu.au>
Hi all

I was just re-reading a paper by Giorgio and realized it had some interesting insights into sampling methods:

http://users.dimi.uniud.it/~giorgio.brajnik/papers/iwwua08-kn.pdf

S. Hartmann et al. (Eds.): WISE 2008, LNCS 5176, pp. 63–80, 2008.
c?
Springer-Verlag Berlin Heidelberg 2008

on p.64 it says:

"Additional evidence exists showing that accessibility evaluation based on a
sample of pages (sampling is necessary for all but trivial websites) can be affected
by the criteria used to select the sample. There is interdependence between the
sampling criteria and the purpose of the accessibility analysis [8], leading to large
differences in accuracy. If the evaluation aims at conformance, then the most
frequently used sampling criterion (selecting predefined pages: home, contact,
site map, etc.) may lead up to a 38% inaccuracy rate, i.e. 38% of the checkpoints
may be wrongly estimated."

I think you will all find this a worthwhile paper if you haven't read it already.  I've seen it before but didn't remember the info on sampling methods.It might be good to include it in the reference section as well.


Regards

Vivienne L. Conway, B.IT(Hons), MACS CT, AALIA(cs)
PhD Candidate & Sessional Lecturer, Edith Cowan University, Perth, W.A.
Director, Web Key IT Pty Ltd.
v.conway@ecu.edu.au
v.conway@webkeyit.com
Mob: 0415 383 673

This email is confidential and intended only for the use of the individual or entity named above. If you are not the intended recipient, you are notified that any dissemination, distribution or copying of this email is strictly prohibited. If you have received this email in error, please notify me immediately by return email or telephone and destroy the original message.
________________________________________
From: Shadi Abou-Zahra [shadi@w3.org]
Sent: Friday, 25 January 2013 3:51 PM
To: Eval TF
Subject: Re: Aim and impact of random sampling

I want to toss in another aspect into this good discussion: most of the
people represented in this group are expert evaluators who have highly
developed instincts for which pages to select in the structured sample.
However, this methodology should also be useful to evaluators with less
expertise. A simple random-like selection of some of the pages helps to
ensure that the sample as a whole is as representative as possible. It
might not add much for expert evaluators but it does not hurt either,
and more importantly it ensures quality of the methodology as a whole.

The question remains how this "simple random-like selection" should be
defined. I agree with Detlev that Peter's suggestion of laying out some
of the different scenarios paired with concrete examples may be useful.

Best,
   Shadi


On 25.1.2013 06:04, Detlev Fischer wrote:
> Hi Richard,
>
> these are excellent examples that demonstrate the usefulness of an automatic method, a heuristic if you like, to complement your initial sample. As I see it, you would generate your URLs and look at every 10th page to see if it looks different from the ones that  you have already included, leading to the discoveries described. This sounds useful and doable - it probably isn't related to  the application of any statistics aimed at generating a statistically 'sufficiently large' sample.
>
> I also take it that you do not test each 10th page in your list of generated URLs for every aspect (SC) and page state that might cause problems (problems that may often not be immediately obvious) - I imagine you may quickly look at it the page without CSS or scan it for unusual content. And this is fine, looks like this could be done without causing a huge overhead of redundant work since you would immediately skip all pages that *look* like those you already have.
>
> I would personally count that under 'clever ways of building up your initial page sample without stopping too early', not as random sampling. This kind if approach seems quite sensible. I wonder a bit whether it still works for *very large sites* - where do you stop? Looking at every tenth page of a large and complex site (e-commerce) is probably not an option. And for pages that generate dynamic content with further states being dependent on your input, checking every tenth page would not work either as such exploration of states can take a lot of time before it could be said to be 'complete' (including reapeated logoffs and logins).
>
> This might bring us back to what Peter described as something like 'suitable avenues for sampling in different cases', such as
>
> 1. small sites (possibly all pages = sample)
> 2. medium-sized simple sites (normal sampling method plus your check of every
>     nth page of a list of generated URLs)
> 3. large sites (not sure)
> 4. One (or a few) page web app (possibly all critical processes)
> 5. highly dynamic sites / web apps with pages generated in response to user input (not sure)
>
> It seems that there probably won't be one approach that fits all these cases, so if a random sampling approach (I would prefer the term sampling heuristics), if included as non-optional, we would need to describe succinctly to the user of WCAG-EM *how to do it* without coming across as overly complex or academic, as just too hard to do for your average skilled a11y expert. If that cannot be achieved, we are killing off WCAG-EM in the bud!
>
> I am grateful that Richard has described here in two concrete examples what he means when talking about random sampling, and I would like other proponents of mandatory random sampling to join him and explain what *they* actually do when they create their random sample, so we get some idea what might work with reasonable effort.
>
> Cheers,
> Detlev
>
>
>
> On 25 Jan 2013, at 00:30, RichardWarren wrote:
>
>> Hi Eric et al.
>>
>> For my two-penny-worth I have found random sampling to be essential. Often it merely reinforces what my structured testing has shown, but occasionally it throws up completely new problems.
>> I have recently (in the last two months) had two cases where this has happened.
>> 1) A site had a whole load of additional, text-heavy, pages that had been included primarily to improve search engine ranking. These were not listed in the official site map and had been completely overlooked during our structured testing. It was only when I ran my robot to list all urls and started my random test (every ten pages) that I found them. Now you might argue that because they were so hard to find that it is unlikely a disabled person would find them - but Google could find them, so they could easily be the landing page of someone who typed in the relevant search term.
>> 2) A charity site that had a "development" area that included videos and slideshows that the (young) web developer had played with years ago but not removed. We rang the owner, she got them removed so we didn't need to test. But the owner was impressed (which is good for business).
>>
>> Now if we had not found those areas and certified the site as compliant and some poor soul had landed up on one of the pages and complained we would have lost credibility. So for me random sampling is not an option. I am just looking for an "approved method".
>>
>> Regards
>> Richard
>>
>>
>>
>> -----Original Message----- From: Ramón Corominas
>> Sent: Thursday, January 24, 2013 10:52 PM
>> To: evelleman@bartimeus.nl
>> Cc: Detlev Fischer ; public-wai-evaltf@w3.org
>> Subject: Re: Aim and impact of random sampling
>>
>> Hi, Eric.
>>
>> I have no documented data (I suppose that I could go through all our
>> reports and obtain something, but I guess it would be a hard job, since
>> the random pages were not specifically marked as such). However, as far
>> as I can remember all evaluations that I've performed in my five years
>> at Technosite repeated the same types of barriers across all pages of
>> the sample, or at least in a significant number of pages.
>>
>> The sample size was normally 30 pages, and usually the first 20-25 were
>> manually selected by one of the "structured" methods. Then the sample
>> was completed with 5-10 random pages to complete the 30 pages. I must
>> admit that these "random" pages were not always so random, but more or
>> less chosen from "random clicks", although sometimes we used WGET to
>> download about 500 pages and selected some real random pages from there.
>>
>> My experience is the same that Detlev mentioned: the random pages (the
>> last third of the sample) were not significantly different from the rest
>> of the structured pages, since most of the problems are repeated. Even
>> if there are specific barriers, they are usually covered by 2 or 3 of
>> the structured pages.
>>
>> Regards,
>> Ramón.
>>
>> Eric wrote:
>>
>>> @Detlev: I see your point, but this wouldn't this only work if there is a re-test?
>>>
>>> @Ramon: Do you have data to support the conclusion that no significant change in the results will be obtained if the sample includes random pages? That would be a good input for our discussion.
>>> Kindest regards,
>>>
>>> Eric
>>>
>>> ________________________________________
>>> Van: Detlev Fischer [detlev.fischer@testkreis.de]
>>> Verzonden: donderdag 24 januari 2013 21:26
>>> Aan: Ramón Corominas
>>> CC: public-wai-evaltf@w3.org
>>> Onderwerp: Re: Aim and impact of random sampling
>>>
>>> Ensuring that clients will render their entire site accessible since they do not know what exact pages will be tested is important. But setting up the rule (once proposed by Léonie, I believe) that in any re-test after remedial action, some pages are replaced by other pages would do the same trick. No need for randomness here.
>>>
>>> For all cases of testing where we will not fimd 100% conformance (the overwhelming majority of sites, in our experience), having extra random pages as a verification exercise wouldn't make much difference - these would usually just reveal yet other instances of some SC not met that are not met anyway elsewhere. The verification aim Eric alluded to in his mail would mainly apply to those rare sites that are picture-perfect paragons of full compliance.
>>>
>>> On 24 Jan 2013, at 20:55, Ramón Corominas wrote:
>>>
>>>> Although I did not use the words "optional/mandatory", I also commented in the survey that some Euracert partners will probably dislike the idea of having to include more pages (= more time and resources), since they consider that the initial structured sampling is enough in most cases, (that is, no significant change in the results will be obtained).
>>>>
>>>> We at Technosite include the "random" part just because the website is evaluated over time, and thus we make clear to the clients that the sample will not always be the same, and therefore they will have to apply the accessibility criteria to the whole website. However, I agree that our "method" to select random pages is certainly not very scientific.
>>>>
>>>> In any case, I assume that the "filter the sample" should be enough to eliminate the problem of time/resources. However,
>>>>
>>>> My vote: it should be an optional step.
>>>>
>>>> Regards,
>>>> Ramón.
>>>>
>>>> Aurélien wrote:
>>>>
>>>>> +1 that the sense of the comment I made on the survey I think this need to be an option
>>>>>
>>>>> Detlev wrote:
>>>>>
>>>>>> The assumption has been that an additional random sample will make sure that a tester's intitial sampling of pages has not left out pages that may expose problems no present in the intitial sample.
>>>>>>
>>>>>> That aim in itself is laudable, but for this to work, the sampling would need to be
>>>>>>
>>>>>> 1. independent of individual tester choices (i.e., automatic) -
>>>>>>   which would need a definition, inside the methodology, of a
>>>>>>   valid approach for truly random sampling. No one has even hinted on
>>>>>>   a reliable way to do that - I believe there is none.
>>>>>>   A mere calculaton of sample size for a desired level of confidence
>>>>>>   would need to be based to the total number of a site's pages *and*
>>>>>>   page states - a number that will usually be unknown.
>>>>>>
>>>>>> 2. Fairly represent not just pages, but also page states.
>>>>>>   But crawling a site to derive a collection of URLS for
>>>>>>   random sampling is not doable since many states (and there URLs or
>>>>>>   DOM states) only come about as a result of human input.
>>>>>>
>>>>>> I hope I am not coming across as a pest if I say again that in my opinion, we are shooting ourselves in the foot if we make random sampling a mandatory part of the WCAG-EM. Academics will be happy, practitioners working to a budget will just stay away from it.
>>
>>
>

--
Shadi Abou-Zahra - http://www.w3.org/People/shadi/
Activity Lead, W3C/WAI International Program Office
Evaluation and Repair Tools Working Group (ERT WG)
Research and Development Working Group (RDWG)

This e-mail is confidential. If you are not the intended recipient you must not disclose or use the information contained within. If you have received it in error please return it to the sender via reply e-mail and delete any record of it from your system. The information contained within is not the opinion of Edith Cowan University in general and the University accepts no liability for the accuracy of the information provided.

CRICOS IPC 00279B
Received on Saturday, 26 January 2013 06:47:23 UTC