> This year, the Billion Triple Challenge data set consists of 2 billion
> triples. The dataset was crawled during May/June 2011 using a random sample
> of URIs from the BTC 2010 dataset as seed URIs. Lots of thanks to Andreas
> Harth for all his effort put into crawling the web to compile this dataset,
> and to the Karlsruher Institut für Technologie which provided the necessary
> hardware for this labour-intensive task.****
>
> **
>
On a related note,
while nothing can beat a custom job obviously,
i feel like reminding that those that don't have said mighty
time/money/resources that any amount of data that one wants rom the
repositories in Sindice which we do make freely available for things like
this. (0 to 20++ billion triples, LOD or non LOD, microformats, RDFa, custom
filtered etc)
See the TREC 2011 competition
http://data.sindice.com/trec2011/download.html (1TB+ of data!) or the
recent W3C data anaysis which is leading to a new reccomendation (
http://www.w3.org/2010/02/rdfa/profile/data/) etc.
trying to help.
Congrats on the great job guys of course for the Semantic web challenge
which is a long standing great initiative!
Gio