[whatwg] Issues concerning the <base> element and xml:base

On 07/08/07, Ian Hickson <ian at hixie.ch> wrote:
> This is how it stood back in May (using a sample of several hundred
> thousand pages taken mostly from the more popular sites); number of unique
> URIs in <base href> attributes as a percentage of all pages parsed:
>
>   0: 93.7%
>   1:  6.31%
>   2:  0.0308%
>   3:  0.00105%
>   4:  0.00197%
>
> This is how it stands as of today (using the same sampling method):
>
>   0: 94.1%
>   1:  5.93%
>   2:  0.0215%
>   3:  0.000928%
>   4:  0.000288%
>
> (All numbers rounded to three significant figures.)

That rounding seems quite misleading - if I haven't forgotten how to
do statistics, and if the details I am forgetting are not critical
ones, and if I'm not misinterpreting how you collected the data, then
the samples are independent and from a binomial distribution that can
be approximated as a normal distribution with standard deviation
sqrt(n*p*(1-p)), and if assuming n=100,000 and guessing p from the
data then the 95%-confidence (+/- 2 s.d.) ranges are something like:

0:  (93.7 +/- 0.15)%
1:  (6.3 +/- 0.15)%
2:  (0.03 +/- 0.01)%
3:  (0.001 +/- 0.002)%
4:  (0.002 +/- 0.003)%

and

0:  (94.1 +/- 0.15)%
1:  (5.9 +/- 0.15)%
2:  (0.02 +/- 0.01)%
3:  (0.001 +/- 0.002)%
4:  (0.0003 +/- 0.001)%

(though the normal approximation breaks down in the <= 0.002% bits),
so you can't determine anything about changes in frequency beyond the
zero/one cases.

-- 
Philip Taylor
excors at gmail.com

Received on Tuesday, 7 August 2007 06:16:21 UTC