On Jan 6, 2005, at 4:51, Karl Dubost wrote: > * Statistical Quantitative analysis (automatic) > - Which HTML elements are used in Web pages? > - Which frequency ? > - Are valid Web pages richer than non-valid ones. (bigger varieties of > HTML element) > - The same for attribute I am willing to test the idea of a statistical analysis module for the logvalidator [1], and wonder if anyone would be interested to work with me on this. [1] http://www.w3.org/QA/Tools/LogValidator/ This might not exactly perform the large-scale study that Karl is thinking of, but it could be a start. I am thinking at the moment of making this module provide: - a rapid summary of element usage over a list of documents - list the n most popular documents without a "real" title ("Welcome to GoLive" does not qualify ;) - ratio of empty versus filled alt attributes Someone recently gave me the idea that the ratio of words over markup is a decent metric for either the "richness" of the page or (if low), a good indicator or a badly written site. Given that an image is worth a thousand words, I assume our formula would be something like (number_of_words+(number_images*1000)) / (number_html_elements). Implementation-wise, does anyone have a recommendation on the library to use? Please drop me a line if you wish to participate in this development. -- olivierReceived on Tuesday, 11 January 2005 05:37:58 GMT
This archive was generated by hypermail 2.2.0+W3C-0.50 : Friday, 15 July 2011 00:13:23 GMT