Amount of tabular data on the web

We make the following claim in our Charter, and in Use Cases: "A large percentage of the data published on the Web is tabular data, commonly published as comma separated values (CSV) files.”

I’m preparing a presentation for SmartData Week [1] on our work, and I was looking for some information to substantiate this claim. Does anyone have a reference to a survey or Common Crawl information that would indicate the magnitude of tabular data (.csv or .tsv) published on the web in comparison to other formats such as HTML, XML or JSON.

Any references to other information that would substantiate the importance of this work as well as useful applications would help in my putting together a presentation.

Gregg Kellogg
gregg@greggkellogg.net

Received on Monday, 20 July 2015 18:45:34 UTC