What's the core value proposition for this work vs. http://commoncrawl.org/and
http://www.webdatacommons.org/ ?
On Friday, May 17, 2013, Marcos Caceres wrote:
> A few of us have a small project (http://webdevdata.org/) that we've been
> using to inform the development of specifications over the last few months.
> It actually started with Steve's research into <main>, for which he used
> some software to crawl a large number of sites, and then grep'd that data
> to get stats that helped support his argument for <main>.
>
> This data set has become increasingly useful to a number of people (the
> RICG has been making extensive use of it), and so have some members of the
> HTMLWG (e.g., [1]).
>
> Anyway, as the headlights activity has the potential to result in the
> allocation of resources for projects, I think it would be good if
> webdevdata.org could be considered as something that can help "close the
> gap" (in that it provides data to help us make informed technical decisions
> about the platform).
>
> What we would like to see:
>
> * monthly or quarterly crawls.
> * hosting and archiving of the data.
> * the ability to search the index through the web.
> * the ability to download the data.
>
> Maybe the W3C could speak to its members in the academic sector for help
> with different ways of searching the data and making statistical analysis
> of it (in a way that helps both Web developers and spec folks).
>
> [1] https://www.w3.org/Bugs/Public/show_bug.cgi?id=19619#c21
> --
> Marcos Caceres
>
>
>
>