```Dear all,

This is my second email on comments to the 2012-02-09 editor's draft. Here
I deal with sampling (chapter 4).

My general view on sampling is that the methodology should have strong
statistical basis for the sample selection and the statistical relevance of
the results obtained for the sample when making considerations on the full
website. In addition, my (little) knowledge of statistics make me worried
about trying to mix samples using different selection criteria. I don't
believe that this a good approach from the viewpoint of statisticall
soundness.

- [4.1] I am not good at statistics, but I have been told that one
cannot mix different sampling  procedures in one study if the results are
to be considered statistically relevant. My understanding of it is that we
can only compare random to random, core to core, and task to task. I
believe that this a crucial for the methodology.
- [4.1.1. Paragraph after item list] No. I am afraid that one cannot
combine samples. The statistical relevance of the results provided by
evaluation the core resource set cannot be improved by adding random pages.
to the core resources? I think it makes sense because both are directed
samples.
- [4.1.3, paragraph 1] I think that the methodology can be more
prescriptive here. It should describe one or more methods for selecting
random samples. If more than one is listed, then the methodology should
provide guidance for choosing between them.
- [4.2. Paragraph 1] No. See my comments above. We cannot combine
samples created using different methods. Each sample (core, task, random)
should be treated separated from the others.
- [4.2. Paragraph 1] Why "minimum of two examples of resources"?
Explanation is needed...
- [4.2. Paragraph 2] I think that the methodology can be much more
precise here. It should be based on "sample theory" from statistics.
Instead of fixing a minimum amount of pages, it should explain how the size
of the sample affects the statistical reliability of the results. The
sample size is different if we want a 95% of reliability (p=0.05) or a 99%
(p=0.01). In addition, I think that sampling theory takes into account
categories (attributes) in the population to determine the relevance of the
sample (like if 50% are women and 50% are men in general population, then
the sample should maintain these percentages). This idea could be reused
for selecting pages with some specific types of content.
- [4.3] I think that the key concept is the "reliability" goal of the
evaluation process. That will define an adequate sample size.

Again, I sincerely hope that my comments will be useful for your work.

Best regards,
Loďc
```
