- From: Karl Dubost <karl@w3.org>
- Date: Fri, 29 Sep 2006 14:05:47 +0900
- To: www-qa@w3.org
- Cc: Ian Hickson <ian@hixie.ch>
Le 29 sept. 06 à 04:59, Bjoern Hoehrmann a écrit : > Defining what a representative sample is and checking > whether a certain sample meets the definition is a rather non-trivial > excercise here. How many Amazon article pages do you include, or how > do you weight them, how do you filter out automatically generated spam > blogs, how do you detect, say, Wikipedia mirrors, and so on. Indeed, and there are many parameters in the equation. It is why I have asked more details to Ian Hickson, because I really think it is as much important as the derived statistics which have been published in the [previous survey][1]. When the sample is not given or clearly identified it is really difficult to draw meaningful conclusions. For *each page*, we need all this information. It means a big set of data, but I think it is necessary. Giving only percentages would not help to analyze consequences between tuples or triples of data. Web page data: - HTTP Date. Why: to identify if there's improvement in time for one page to identify the type of population through times. the Web from 5 years ago to the Web of now. Issue: Some dynamic Web sites don't cache correctly this information. Half solution, on one year, going on the page a few times through one year and verify the date, compared to the MD5 value of the page. - Mime-Type sent. - DOCTYPE - Is it well-formed (for XHTML ones)? - Is it valid? - Access to the page: DNS, connection Do people see other type of datas? For now, I would like to focus on the meta level of the document more than the statistics about the element/attributes demography. It is important also to specify the tool and the version which has been used for validation. Validators have troubles too. If we want to be consistent, we have to be careful about the tool we are using. [1]: http://code.google.com/webstats/ > So, as of one week ago, 18% of W3C Members had a homepage that passed > the W3C Markup Validator, compared to 9% when I started the survey 2 > years ago, and pages with 10 or less erorrs are up from 28% to 43%. Very interesting. Thanks for this Bjoern. I will not draw quick conclusions. But it's at least encouraging. It would require a bit more exploration. I think there is a room to develop a regular survey with a sample which would be clearly identified. I will see what we (W3C + external participation) can do in this area. I'm gathering requirements. > So I don't know much about what Karl is asking for either, but it > seems > justified to say that for up to date pages, authors pay more attention > to syntax problems than they did some years ago; my questions were ill-formed. Your mail helped to clarify. > and no matter how you > look at it, less than 20% of pages are "valid" or "well-formed" or > "con- > forming" under some definition of those terms, in which case picking a > good sample to derive meaningful results becomes rather important. Definitely. -- Karl Dubost - http://www.w3.org/People/karl/ W3C Conformance Manager, QA Activity Lead QA Weblog - http://www.w3.org/QA/ *** Be Strict To Be Cool ***
Received on Friday, 29 September 2006 05:06:09 UTC