Web Dev Data

A few of us have a small project (http://webdevdata.org/) that we've been using to inform the development of specifications over the last few months. It actually started with Steve's research into <main>, for which he used some software to crawl a large number of sites, and then grep'd that data to get stats that helped support his argument for <main>. 

This data set has become increasingly useful to a number of people (the RICG has been making extensive use of it), and so have some members of the HTMLWG (e.g., [1]). 

Anyway, as the headlights activity has the potential to result in the allocation of resources for projects, I think it would be good if webdevdata.org could be considered as something that can help "close the gap" (in that it provides data to help us make informed technical decisions about the platform).  

What we would like to see: 

* monthly or quarterly crawls.
* hosting and archiving of the data.  
* the ability to search the index through the web.
* the ability to download the data. 

Maybe the W3C could speak to its members in the academic sector for help with different ways of searching the data and making statistical analysis of it (in a way that helps both Web developers and spec folks).  

[1] https://www.w3.org/Bugs/Public/show_bug.cgi?id=19619#c21
-- 
Marcos Caceres

Received on Friday, 17 May 2013 09:22:59 UTC