- From: Philip Taylor <excors+whatwg@gmail.com>
- Date: Tue, 7 Aug 2007 14:16:21 +0100
On 07/08/07, Ian Hickson <ian at hixie.ch> wrote: > This is how it stood back in May (using a sample of several hundred > thousand pages taken mostly from the more popular sites); number of unique > URIs in <base href> attributes as a percentage of all pages parsed: > > 0: 93.7% > 1: 6.31% > 2: 0.0308% > 3: 0.00105% > 4: 0.00197% > > This is how it stands as of today (using the same sampling method): > > 0: 94.1% > 1: 5.93% > 2: 0.0215% > 3: 0.000928% > 4: 0.000288% > > (All numbers rounded to three significant figures.) That rounding seems quite misleading - if I haven't forgotten how to do statistics, and if the details I am forgetting are not critical ones, and if I'm not misinterpreting how you collected the data, then the samples are independent and from a binomial distribution that can be approximated as a normal distribution with standard deviation sqrt(n*p*(1-p)), and if assuming n=100,000 and guessing p from the data then the 95%-confidence (+/- 2 s.d.) ranges are something like: 0: (93.7 +/- 0.15)% 1: (6.3 +/- 0.15)% 2: (0.03 +/- 0.01)% 3: (0.001 +/- 0.002)% 4: (0.002 +/- 0.003)% and 0: (94.1 +/- 0.15)% 1: (5.9 +/- 0.15)% 2: (0.02 +/- 0.01)% 3: (0.001 +/- 0.002)% 4: (0.0003 +/- 0.001)% (though the normal approximation breaks down in the <= 0.002% bits), so you can't determine anything about changes in frequency beyond the zero/one cases. -- Philip Taylor excors at gmail.com
Received on Tuesday, 7 August 2007 06:16:21 UTC