Re: Web page stats

* Ian Hickson wrote:
>>> The sample in question was very large (10 digits), so these are pretty 
>>> representative numbers.
>> 
>> In a recent study I found that 99% of earthlings are female. The sample 
>> in question was very large (3 billion specimen), so these are pretty 
>> representative numbers; therefore we can conclude there are almost no 
>> males on this planet.
>
>s/, so/. Also, IMHO,/.
>
>(At least for the error data. The XHTML vs HTML data may be skewed by the 
>sampling method.)

That's better. Defining what a representative sample is and checking
whether a certain sample meets the definition is a rather non-trivial
excercise here. How many Amazon article pages do you include, or how
do you weight them, how do you filter out automatically generated spam
blogs, how do you detect, say, Wikipedia mirrors, and so on.

Here is a different set for the validity of W3C Member homepages as
determined by the W3C Markup Validator; the first column is the day,
the second homepages with no error, then pages with 1-10 errors, more
than ten errors, and pages that could not be validated (DNS errors,
bad connection, encoding errors, member has no homepage, ...)

  2004-07-07 33 71 216 45
  2004-07-20 33 70 217 43
  2004-08-03 35 71 209 44
  2004-08-09 35 70 213 42
  2004-08-24 34 72 211 41
  2004-09-12 36 69 211 45
  ...
  2006-02-08 67 87 237 9
  2006-03-03 69 88 241 3
  2006-03-17 72 88 238 4
  2006-04-01 76 87 237 10
  2006-04-21 73 87 236 9
  2006-05-19 73 91 233 8
  2006-06-17 77 85 236 6
  2006-08-19 76 100 232 9
  2006-09-21 76 103 226 14

So, as of one week ago, 18% of W3C Members had a homepage that passed
the W3C Markup Validator, compared to 9% when I started the survey 2
years ago, and pages with 10 or less erorrs are up from 28% to 43%. So
I don't know much about what Karl is asking for either, but it seems
justified to say that for up to date pages, authors pay more attention
to syntax problems than they did some years ago; and no matter how you
look at it, less than 20% of pages are "valid" or "well-formed" or "con-
forming" under some definition of those terms, in which case picking a
good sample to derive meaningful results becomes rather important.
-- 
Björn Höhrmann · mailto:bjoern@hoehrmann.de · http://bjoern.hoehrmann.de
Weinh. Str. 22 · Telefon: +49(0)621/4309674 · http://www.bjoernsworld.de
68309 Mannheim · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/ 

Received on Thursday, 28 September 2006 19:59:36 UTC