My views on sampling from Loïc Martínez Normand on 2012-02-20 (public-wai-evaltf@w3.org from February 2012)

From: Loïc Martínez Normand <loic@fi.upm.es>
Date: Tue, 21 Feb 2012 00:21:03 +0100
To: public-wai-evaltf@w3.org
Message-ID: <CAJpUyzn2tO9_9SF9xzs3RciEfQhBp9hejo36W5S0593UM9OMAg@mail.gmail.com>

Dear all,

This is my second email on comments to the 2012-02-09 editor's draft. Here
I deal with sampling (chapter 4).

My general view on sampling is that the methodology should have strong
statistical basis for the sample selection and the statistical relevance of
the results obtained for the sample when making considerations on the full
website. In addition, my (little) knowledge of statistics make me worried
about trying to mix samples using different selection criteria. I don't
believe that this a good approach from the viewpoint of statisticall
soundness.

Detailed comments on chapter 4:

   - [4.1] I am not good at statistics, but I have been told that one
   cannot mix different sampling  procedures in one study if the results are
   to be considered statistically relevant. My understanding of it is that we
   can only compare random to random, core to core, and task to task. I
   believe that this a crucial for the methodology.
   - [4.1.1. Paragraph after item list] No. I am afraid that one cannot
   combine samples. The statistical relevance of the results provided by
   evaluation the core resource set cannot be improved by adding random pages.
   - [4.1.2] One idea, why not adding "task-oriented" (complete processes)
   to the core resources? I think it makes sense because both are directed
   samples.
   - [4.1.3, paragraph 1] I think that the methodology can be more
   prescriptive here. It should describe one or more methods for selecting
   random samples. If more than one is listed, then the methodology should
   provide guidance for choosing between them.
   - [4.2. Paragraph 1] No. See my comments above. We cannot combine
   samples created using different methods. Each sample (core, task, random)
   should be treated separated from the others.
   - [4.2. Paragraph 1] Why "minimum of two examples of resources"?
   Explanation is needed...
   - [4.2. Paragraph 2] I think that the methodology can be much more
   precise here. It should be based on "sample theory" from statistics.
   Instead of fixing a minimum amount of pages, it should explain how the size
   of the sample affects the statistical reliability of the results. The
   sample size is different if we want a 95% of reliability (p=0.05) or a 99%
   (p=0.01). In addition, I think that sampling theory takes into account
   categories (attributes) in the population to determine the relevance of the
   sample (like if 50% are women and 50% are men in general population, then
   the sample should maintain these percentages). This idea could be reused
   for selecting pages with some specific types of content.
   - [4.3] I think that the key concept is the "reliability" goal of the
   evaluation process. That will define an adequate sample size.

Again, I sincerely hope that my comments will be useful for your work.

Best regards,
Loïc

Received on Tuesday, 21 February 2012 10:58:43 UTC