Re: Web Dev Data from Alex Russell on 2013-05-17 (public-closingthegap@w3.org from May 2013)

From: Alex Russell <slightlyoff@google.com>
Date: Fri, 17 May 2013 14:31:18 +0100
To: Marcos Caceres <marcos@marcosc.com>
Cc: "public-closingthegap@w3.org" <public-closingthegap@w3.org>, "dom@w3.org" <dom@w3.org>, Steven Faulkner <faulkner.steve@gmail.com>, Pieters Simon <simonp@opera.com>, "yoav@yoav.ws" <yoav@yoav.ws>
Message-ID: <CANr5HFVrLK=hy5rsZMvZAFX4Q8rjOXH45aHCXe3rG0J-J1dEwQ@mail.gmail.com>

What's the core value proposition for this work vs. http://commoncrawl.org/and
http://www.webdatacommons.org/ ?

On Friday, May 17, 2013, Marcos Caceres wrote:

> A few of us have a small project (http://webdevdata.org/) that we've been
> using to inform the development of specifications over the last few months.
> It actually started with Steve's research into <main>, for which he used
> some software to crawl a large number of sites, and then grep'd that data
> to get stats that helped support his argument for <main>.
>
> This data set has become increasingly useful to a number of people (the
> RICG has been making extensive use of it), and so have some members of the
> HTMLWG (e.g., [1]).
>
> Anyway, as the headlights activity has the potential to result in the
> allocation of resources for projects, I think it would be good if
> webdevdata.org could be considered as something that can help "close the
> gap" (in that it provides data to help us make informed technical decisions
> about the platform).
>
> What we would like to see:
>
> * monthly or quarterly crawls.
> * hosting and archiving of the data.
> * the ability to search the index through the web.
> * the ability to download the data.
>
> Maybe the W3C could speak to its members in the academic sector for help
> with different ways of searching the data and making statistical analysis
> of it (in a way that helps both Web developers and spec folks).
>
> [1] https://www.w3.org/Bugs/Public/show_bug.cgi?id=19619#c21
> --
> Marcos Caceres
>
>
>
>

Received on Friday, 17 May 2013 13:31:50 UTC