[Ann] Sindice Data for TREC Entity Track 2011

The Sindice team is happy to announce the release of two datasets for
the TREC entity track 2011 [1].

The Sindice-2011 data collection is based on the data collected as
part of the Sindice project as of 05/13/2011. It contains around 10
billions triples (more than 1TB of data uncompressed), and it is
available at http://data.sindice.com/trec2011/.

The Sindice-2011 data collection provides information in the form of
RDF triples about entities. The data comes from different Linked Data
RDF dumps and from web documents containing RDF, RDFa and Microformats
documents crawled since 2009. The data is very diverse and covers
thousands of second-level domains. More statistics are available on
the statistics page.

The Sindice data collection is available for download in two different
formats: document-centric (Sindice-DE) and entity-centric
(Sindice-ED). The datasets are composed by entities extracted from the
Sindice data collection and described as key value pairs in RDF
format. You can find more information about these two datasets on the
documentation page.

This marks the launch of data.sindice.com. In the future the idea is
to share here things like a complete monthly dump, aggregate data
statistics, and more. We're happy to hear your requests.

Kind Regards,
--
The Sindice Team

[1] http://ilps.science.uva.nl/trec-entity/

Received on Monday, 13 June 2011 05:57:56 UTC