- From: Bjoern Hoehrmann <derhoermi@gmx.net>
- Date: Thu, 28 Sep 2006 21:59:25 +0200
- To: Ian Hickson <ian@hixie.ch>
- Cc: www-qa@w3.org
* Ian Hickson wrote: >>> The sample in question was very large (10 digits), so these are pretty >>> representative numbers. >> >> In a recent study I found that 99% of earthlings are female. The sample >> in question was very large (3 billion specimen), so these are pretty >> representative numbers; therefore we can conclude there are almost no >> males on this planet. > >s/, so/. Also, IMHO,/. > >(At least for the error data. The XHTML vs HTML data may be skewed by the >sampling method.) That's better. Defining what a representative sample is and checking whether a certain sample meets the definition is a rather non-trivial excercise here. How many Amazon article pages do you include, or how do you weight them, how do you filter out automatically generated spam blogs, how do you detect, say, Wikipedia mirrors, and so on. Here is a different set for the validity of W3C Member homepages as determined by the W3C Markup Validator; the first column is the day, the second homepages with no error, then pages with 1-10 errors, more than ten errors, and pages that could not be validated (DNS errors, bad connection, encoding errors, member has no homepage, ...) 2004-07-07 33 71 216 45 2004-07-20 33 70 217 43 2004-08-03 35 71 209 44 2004-08-09 35 70 213 42 2004-08-24 34 72 211 41 2004-09-12 36 69 211 45 ... 2006-02-08 67 87 237 9 2006-03-03 69 88 241 3 2006-03-17 72 88 238 4 2006-04-01 76 87 237 10 2006-04-21 73 87 236 9 2006-05-19 73 91 233 8 2006-06-17 77 85 236 6 2006-08-19 76 100 232 9 2006-09-21 76 103 226 14 So, as of one week ago, 18% of W3C Members had a homepage that passed the W3C Markup Validator, compared to 9% when I started the survey 2 years ago, and pages with 10 or less erorrs are up from 28% to 43%. So I don't know much about what Karl is asking for either, but it seems justified to say that for up to date pages, authors pay more attention to syntax problems than they did some years ago; and no matter how you look at it, less than 20% of pages are "valid" or "well-formed" or "con- forming" under some definition of those terms, in which case picking a good sample to derive meaningful results becomes rather important. -- Björn Höhrmann · mailto:bjoern@hoehrmann.de · http://bjoern.hoehrmann.de Weinh. Str. 22 · Telefon: +49(0)621/4309674 · http://www.bjoernsworld.de 68309 Mannheim · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/
Received on Thursday, 28 September 2006 19:59:36 UTC