- From: olivier Thereaux <ot@w3.org>
- Date: Tue, 11 Jan 2005 14:37:55 +0900
- To: 'public-evangelist@w3.org' <public-evangelist@w3.org>
- Cc: Karl Dubost <karl@w3.org>
- Message-Id: <F19FB73D-6392-11D9-9C98-000393A80896@w3.org>
On Jan 6, 2005, at 4:51, Karl Dubost wrote: > * Statistical Quantitative analysis (automatic) > - Which HTML elements are used in Web pages? > - Which frequency ? > - Are valid Web pages richer than non-valid ones. (bigger varieties of > HTML element) > - The same for attribute I am willing to test the idea of a statistical analysis module for the logvalidator [1], and wonder if anyone would be interested to work with me on this. [1] http://www.w3.org/QA/Tools/LogValidator/ This might not exactly perform the large-scale study that Karl is thinking of, but it could be a start. I am thinking at the moment of making this module provide: - a rapid summary of element usage over a list of documents - list the n most popular documents without a "real" title ("Welcome to GoLive" does not qualify ;) - ratio of empty versus filled alt attributes Someone recently gave me the idea that the ratio of words over markup is a decent metric for either the "richness" of the page or (if low), a good indicator or a badly written site. Given that an image is worth a thousand words, I assume our formula would be something like (number_of_words+(number_images*1000)) / (number_html_elements). Implementation-wise, does anyone have a recommendation on the library to use? Please drop me a line if you wish to participate in this development. -- olivier
Received on Tuesday, 11 January 2005 05:37:58 UTC