- From: Giovanni Tummarello <giovanni.tummarello@deri.org>
- Date: Mon, 13 Jun 2011 07:57:27 +0200
- To: Semantic Web at W3C <semantic-web@w3.org>
The Sindice team is happy to announce the release of two datasets for the TREC entity track 2011 [1]. The Sindice-2011 data collection is based on the data collected as part of the Sindice project as of 05/13/2011. It contains around 10 billions triples (more than 1TB of data uncompressed), and it is available at http://data.sindice.com/trec2011/. The Sindice-2011 data collection provides information in the form of RDF triples about entities. The data comes from different Linked Data RDF dumps and from web documents containing RDF, RDFa and Microformats documents crawled since 2009. The data is very diverse and covers thousands of second-level domains. More statistics are available on the statistics page. The Sindice data collection is available for download in two different formats: document-centric (Sindice-DE) and entity-centric (Sindice-ED). The datasets are composed by entities extracted from the Sindice data collection and described as key value pairs in RDF format. You can find more information about these two datasets on the documentation page. This marks the launch of data.sindice.com. In the future the idea is to share here things like a complete monthly dump, aggregate data statistics, and more. We're happy to hear your requests. Kind Regards, -- The Sindice Team [1] http://ilps.science.uva.nl/trec-entity/
Received on Monday, 13 June 2011 05:57:56 UTC