Re: code, samp, kbd, var

On Tue, 15 May 2007, Ian Hickson wrote:

>>> Sample size: several billion pages.
>>
>> It's hardly a sample. (See Statistics 101.)
>
> It's a sample, though the Web provides us with a somewhat unique situation
> in that there's an infinite number of pages, and we have to somehow pick a
> relevant subset from that.

No, it's not a sample. You seem to be even uncertain about what might be 
the population, so how could you draw a sample? A sample is a subset of a 
population with known probabilities for a member of the population to be 
in the sample. (In simple and common sampling, all members have the same 
probability.)

> The pages I scanned for this study are a
> small subset of those Google knows about.

Which in turn are not the same thing as the set of all web pages, no 
matter exactly how you define "web page". So even if you drew a sample 
from Google database, which you probably didn't, it would not a sample of 
web pages.

> Unfortunately for business reasons I can't reveal much about the 
> methodology used for picking the sample.

Then the data you present is worthless in a discussion and especially in a 
debate.

> The data tells you what is

No it doesn't. The data itself does not tell what it is.

-- 
Jukka "Yucca" Korpela, http://www.cs.tut.fi/~jkorpela/

Received on Tuesday, 15 May 2007 07:05:53 UTC