- From: Loďc Martínez Normand <loic@fi.upm.es>
- Date: Tue, 21 Feb 2012 00:21:03 +0100
- To: public-wai-evaltf@w3.org
- Message-ID: <CAJpUyzn2tO9_9SF9xzs3RciEfQhBp9hejo36W5S0593UM9OMAg@mail.gmail.com>
Dear all, This is my second email on comments to the 2012-02-09 editor's draft. Here I deal with sampling (chapter 4). My general view on sampling is that the methodology should have strong statistical basis for the sample selection and the statistical relevance of the results obtained for the sample when making considerations on the full website. In addition, my (little) knowledge of statistics make me worried about trying to mix samples using different selection criteria. I don't believe that this a good approach from the viewpoint of statisticall soundness. Detailed comments on chapter 4: - [4.1] I am not good at statistics, but I have been told that one cannot mix different sampling procedures in one study if the results are to be considered statistically relevant. My understanding of it is that we can only compare random to random, core to core, and task to task. I believe that this a crucial for the methodology. - [4.1.1. Paragraph after item list] No. I am afraid that one cannot combine samples. The statistical relevance of the results provided by evaluation the core resource set cannot be improved by adding random pages. - [4.1.2] One idea, why not adding "task-oriented" (complete processes) to the core resources? I think it makes sense because both are directed samples. - [4.1.3, paragraph 1] I think that the methodology can be more prescriptive here. It should describe one or more methods for selecting random samples. If more than one is listed, then the methodology should provide guidance for choosing between them. - [4.2. Paragraph 1] No. See my comments above. We cannot combine samples created using different methods. Each sample (core, task, random) should be treated separated from the others. - [4.2. Paragraph 1] Why "minimum of two examples of resources"? Explanation is needed... - [4.2. Paragraph 2] I think that the methodology can be much more precise here. It should be based on "sample theory" from statistics. Instead of fixing a minimum amount of pages, it should explain how the size of the sample affects the statistical reliability of the results. The sample size is different if we want a 95% of reliability (p=0.05) or a 99% (p=0.01). In addition, I think that sampling theory takes into account categories (attributes) in the population to determine the relevance of the sample (like if 50% are women and 50% are men in general population, then the sample should maintain these percentages). This idea could be reused for selecting pages with some specific types of content. - [4.3] I think that the key concept is the "reliability" goal of the evaluation process. That will define an adequate sample size. Again, I sincerely hope that my comments will be useful for your work. Best regards, Loďc
Received on Tuesday, 21 February 2012 10:58:43 UTC