- From: Karl Dubost <karl@w3.org>
- Date: Fri, 29 Sep 2006 14:05:47 +0900
- To: www-qa@w3.org
- Cc: Ian Hickson <ian@hixie.ch>
Le 29 sept. 06 à 04:59, Bjoern Hoehrmann a écrit :
> Defining what a representative sample is and checking
> whether a certain sample meets the definition is a rather non-trivial
> excercise here. How many Amazon article pages do you include, or how
> do you weight them, how do you filter out automatically generated spam
> blogs, how do you detect, say, Wikipedia mirrors, and so on.
Indeed, and there are many parameters in the equation.
It is why I have asked more details to Ian Hickson, because I really
think it is as much important as the derived statistics which have
been published in the [previous survey][1]. When the sample is not
given or clearly identified it is really difficult to draw meaningful
conclusions.
For *each page*, we need all this information. It means a big set of
data, but I think it is necessary. Giving only percentages would not
help to analyze consequences between tuples or triples of data.
Web page data:
- HTTP Date.
Why: to identify if there's improvement in time for one page
to identify the type of population through times. the Web
from 5 years ago to the Web of now.
Issue: Some dynamic Web sites don't cache correctly this
information.
Half solution, on one year, going on the page a few
times through one year and verify the date, compared to the MD5 value
of the page.
- Mime-Type sent.
- DOCTYPE
- Is it well-formed (for XHTML ones)?
- Is it valid?
- Access to the page: DNS, connection
Do people see other type of datas? For now, I would like to focus on
the meta level of the document more than the statistics about the
element/attributes demography.
It is important also to specify the tool and the version which has
been used for validation. Validators have troubles too. If we want to
be consistent, we have to be careful about the tool we are using.
[1]: http://code.google.com/webstats/
> So, as of one week ago, 18% of W3C Members had a homepage that passed
> the W3C Markup Validator, compared to 9% when I started the survey 2
> years ago, and pages with 10 or less erorrs are up from 28% to 43%.
Very interesting.
Thanks for this Bjoern. I will not draw quick conclusions. But it's
at least encouraging. It would require a bit more exploration. I
think there is a room to develop a regular survey with a sample which
would be clearly identified. I will see what we (W3C + external
participation) can do in this area. I'm gathering requirements.
> So I don't know much about what Karl is asking for either, but it
> seems
> justified to say that for up to date pages, authors pay more attention
> to syntax problems than they did some years ago;
my questions were ill-formed. Your mail helped to clarify.
> and no matter how you
> look at it, less than 20% of pages are "valid" or "well-formed" or
> "con-
> forming" under some definition of those terms, in which case picking a
> good sample to derive meaningful results becomes rather important.
Definitely.
--
Karl Dubost - http://www.w3.org/People/karl/
W3C Conformance Manager, QA Activity Lead
QA Weblog - http://www.w3.org/QA/
*** Be Strict To Be Cool ***
Received on Friday, 29 September 2006 05:06:09 UTC