- From: olivier Thereaux <ot@w3.org>
- Date: Tue, 11 Jan 2005 14:37:55 +0900
- To: 'public-evangelist@w3.org' <public-evangelist@w3.org>
- Cc: Karl Dubost <karl@w3.org>
- Message-Id: <F19FB73D-6392-11D9-9C98-000393A80896@w3.org>
On Jan 6, 2005, at 4:51, Karl Dubost wrote:
> * Statistical Quantitative analysis (automatic)
> - Which HTML elements are used in Web pages?
> - Which frequency ?
> - Are valid Web pages richer than non-valid ones. (bigger varieties of
> HTML element)
> - The same for attribute
I am willing to test the idea of a statistical analysis module for the
logvalidator [1], and wonder if anyone would be interested to work with
me on this.
[1] http://www.w3.org/QA/Tools/LogValidator/
This might not exactly perform the large-scale study that Karl is
thinking of, but it could be a start.
I am thinking at the moment of making this module provide:
- a rapid summary of element usage over a list of documents
- list the n most popular documents without a "real" title ("Welcome to
GoLive" does not qualify ;)
- ratio of empty versus filled alt attributes
Someone recently gave me the idea that the ratio of words over markup
is a decent metric for either the "richness" of the page or (if low), a
good indicator or a badly written site. Given that an image is worth a
thousand words, I assume our formula would be something like
(number_of_words+(number_images*1000)) / (number_html_elements).
Implementation-wise, does anyone have a recommendation on the library
to use?
Please drop me a line if you wish to participate in this development.
--
olivier
Received on Tuesday, 11 January 2005 05:37:58 UTC