AW: Randomly choosing pages from Kerstin Probiesch on 2012-09-14 (public-wai-evaltf@w3.org from September 2012)

From: Kerstin Probiesch <k.probiesch@gmail.com>
Date: Fri, 14 Sep 2012 09:47:59 +0200
To: "'Vivienne CONWAY'" <v.conway@ecu.edu.au>
Cc: "'Eval TF'" <public-wai-evaltf@w3.org>
Message-ID: <5052e0e3.6256b40a.5851.7243@mx.google.com>
Hi Vivienne, all,

I think the importance of random sampling is much clearer when we don't
think in "pages". Especially when we think about very huge sections (for
example subdomains) and different groups of editors or different editors a
proper random sample can make sure that an evaluator tests not only the
accessibility of content which was edited by the same editor/s.

There are different possibilities of random sampling. One of course is
having all pages in the same sample space and choose randomly (with a
script) X pages and check the edited content of those X pages.

Another sampling procedure is cluster sampling. Cluster sampling could be
the following: 

1. Identify the clusters of a website (for example: subdomains, sections
according to the main points in a navigation bar)
2. Choose X pages out of every cluster and make sure that all relevant SCs
are checked on the randomly selected pages of every identified cluster. This
could for example mean: Check 1.1.1, 1.3.1, ... in the content of X pages of
every cluster.

I think that it is not necessary to check all on every page of the random
sample. The evaluator has already checked for example the navigation bars or
other global elements like the footer. So the random sample is more for the
edited content and not for the page in whole.

What random sampling should avoid is oversampling and undersampling in
different contexts of the evaluation process. oversampling as well as
undersampling are relevant sampling errors. Thinking about a script which
chooses 5 pages (just for saying a number) out of the sample space (the
whole website) every content of every page has the same probability to be
selected. But: in the same time this procedure would be undersampling if a
website has for example 10 subdomains.

Just some ideas about random sampling

Cheers


Kerstin 


> -----Ursprüngliche Nachricht-----
> Von: Vivienne CONWAY [mailto:v.conway@ecu.edu.au]
> Gesendet: Freitag, 14. September 2012 08:37
> An: Detlev Fischer
> Cc: Eval TF
> Betreff: RE: Randomly choosing pages
> 
> Hi Detlev and TF
> 
> I'm with you on this one.  I'm just about to start a large audit and
> thought I'd put this into practice, but for the life of me I can't see
> an easy way to find 6 or 7 truly random pages.  I've suggested to one
> of the automated tool companies that they build this feature into their
> crawling options so that the tool would randomly choose a number of
> pages stipulated by the evaluator, and then that evaluator could also
> manually assess those same pages.  Until then however, I have no idea
> how it would be truly 'random'.  I don't think 'random' is supposed to
> mean me just saying 'I think this one will do'.  We're already
> selectively targeting pages that we've identified as critical to the
> operation of the website, use cases, complete paths etc.  I have no
> idea what we will be able to do with this requirement.
> 
> 
> Regards
> 
> Vivienne L. Conway, B.IT(Hons), MACS CT, AALIA(cs)
> PhD Candidate & Sessional Lecturer, Edith Cowan University, Perth, W.A.
> Director, Web Key IT Pty Ltd.
> v.conway@ecu.edu.au
> v.conway@webkeyit.com
> Mob: 0415 383 673
> 
> This email is confidential and intended only for the use of the
> individual or entity named above. If you are not the intended
> recipient, you are notified that any dissemination, distribution or
> copying of this email is strictly prohibited. If you have received this
> email in error, please notify me immediately by return email or
> telephone and destroy the original message.
> ________________________________________
> From: Detlev Fischer [detlev.fischer@testkreis.de]
> Sent: Friday, 14 September 2012 2:33 PM
> To: Vivienne CONWAY
> Cc: Eval TF
> Subject: Re: Randomly choosing pages
> 
> Hi Vivienne,
> 
> I remember we have discussed this already at length without ever
> coming to a sound conclusion. I suggest a practical perspective: If it
> is to be mandatory that a part of the sample is found in a true random
> process, this imposes quite a hard requirement on the evaluator:
> 
> 1) He/she has to judiciously apply some crawling tool to ensure that
> the applicable scope is fully crawled and all pages are included in
> the set (excluding those that are chosen by other means) - and the
> scope pf evaluation may include not just one simple hierarchical tree
> but several sub-domains, generated pages that even don't exist without
> user input, etc, so it is rarely an easy task, and quite hard for
> complex sites;
> 
> 2) Then he/she has to apply a random procedure to the complete set of
> pages/ states within the scope by applying some random choice tool
> 
> I remember some of these tools were said to exist and might be put to
> practice, but the overhead of work seems inordinate for the added
> benefit of having a few truly random pages included. And all this
> hinges on the ability and means to verify that a truly random
> procedure has indeed be applied. Who is going to check this, from the
> outside? To enagle independent verification would mean that the
> crawing and selection stages and tools will have to be documented for
> the process to be potentially 'replicable' (with different results of
> course, otherwise it would not be truly random). And if (more than
> likely) *now one* will be willing and able to ever check, we are just
> left to *believe* that the 'random pages' were indeed chosen by true
> random sampling. The concencious ones will go to a lot of trouble for
> something unverifiable, the less conscientious ones will just take an
> informal 'random pick' and claim the pages were chosen 'at
> random' (which might even be true in the colloquial sense of the word).
> 
> I still don"t see the added benefit of making additional random
> sampling a mandatory (methodology) requirement...
> 
> Just my 2 cents, as they say - Detlev
> 
> 
> 
> On 14 Sep 2012, at 05:11, Vivienne CONWAY wrote:
> 
> > Hi all
> >
> > As we're giving some thought to the inclusion of randomly selected
> > pages for part of the sample, I'm wondering how people propose the
> > evaluator would generate the randomly chosen pages.
> >
> > Any thoughts?
> >
> >
> > Regards
> >
> > Vivienne L. Conway, B.IT(Hons), MACS CT, AALIA(cs)
> > PhD Candidate & Sessional Lecturer, Edith Cowan University, Perth,
> > W.A.
> > Director, Web Key IT Pty Ltd.
> > v.conway@ecu.edu.au
> > v.conway@webkeyit.com
> > Mob: 0415 383 673
> >
> > This email is confidential and intended only for the use of the
> > individual or entity named above. If you are not the intended
> > recipient, you are notified that any dissemination, distribution or
> > copying of this email is strictly prohibited. If you have received
> > this email in error, please notify me immediately by return email or
> > telephone and destroy the original message.
> > ________________________________________
> > From: Shadi Abou-Zahra [shadi@w3.org]
> > Sent: Friday, 14 September 2012 5:07 AM
> > To: Eval TF
> > Subject: Minutes for Teleconference on 13 September 2012
> >
> > Eval TF,
> >
> > Please find the minutes for the teleconference on 13 September 2012:
> >  - <http://www.w3.org/2012/09/13-eval-minutes>
> >
> > Next meeting: Thursday 20 September 2012.
> >
> >
> > Regards,
> >   Shadi
> >
> > --
> > Shadi Abou-Zahra - http://www.w3.org/People/shadi/
> > Activity Lead, W3C/WAI International Program Office
> > Evaluation and Repair Tools Working Group (ERT WG)
> > Research and Development Working Group (RDWG)
> >
> > This e-mail is confidential. If you are not the intended recipient
> > you must not disclose or use the information contained within. If
> > you have received it in error please return it to the sender via
> > reply e-mail and delete any record of it from your system. The
> > information contained within is not the opinion of Edith Cowan
> > University in general and the University accepts no liability for
> > the accuracy of the information provided.
> >
> > CRICOS IPC 00279B
> >
> 
> --
> Detlev Fischer
> testkreis - das Accessibility-Team von feld.wald.wiese
> c/o feld.wald.wiese
> Thedestraße 2
> 22767 Hamburg
> 
> Tel   +49 (0)40 439 10 68-3
> Mobil +49 (0)1577 170 73 84
> Fax   +49 (0)40 439 10 68-5
> 
> http://www.testkreis.de
> Beratung, Tests und Schulungen für barrierefreie Websites
> 
> This e-mail is confidential. If you are not the intended recipient you
> must not disclose or use the information contained within. If you have
> received it in error please return it to the sender via reply e-mail
> and delete any record of it from your system. The information contained
> within is not the opinion of Edith Cowan University in general and the
> University accepts no liability for the accuracy of the information
> provided.
> 
> CRICOS IPC 00279B
Received on Friday, 14 September 2012 07:47:17 UTC