- From: Aidan Hogan <aidhog@gmail.com>
- Date: Mon, 15 Jul 2019 19:17:46 -0400
- To: semantic-web@w3.org
- Cc: Jose Miguel Herrera <jherreram@gmail.com>, Tobias Käfer <tobias.kaefer@kit.edu>
Hi all, We are very pleased to announce the release of the BTC 2019 dataset. The dataset is the result of crawling RDF data (RDF/XML, Turtle, N-Triples) from the Web for around one month (2018/12/12 until 2019/01/11). The dataset contains 2,155,856,033 quads, collected from 2,641,253 RDF documents on 394 pay-level domains. Merging the data into one RDF graph results in 256,059,356 unique triples (many duplicated triples are crawled from different documents on Wikidata). The data contain 38,156 unique predicates and instances of 120,037 unique classes. The data are available here: https://zenodo.org/record/2634588 For more details, you can check out the following paper recently accepted for the ISWC 2019 Resources Track: José-Miguel Herrera, Aidan Hogan and Tobias Käfer. "BTC-2019: The 2019 Billion Triple Challenge Dataset". In the Proceedings of the 18th International Semantic Web Conference (ISWC), Auckland, New Zealand, October 26–30, 2019 (Resources track). - http://aidanhogan.com/docs/btc2019.pdf The previous release in the series was the BTC 2014 dataset: http://km.aifb.kit.edu/projects/btc-2014/ We hope you might find the dataset useful! Best, José Miguel, Aidan, Tobias
Received on Monday, 15 July 2019 23:18:11 UTC