W3C home > Mailing lists > Public > semantic-web@w3.org > July 2019

BTC 2019: Billion Triple Challenge 2019 Dataset

From: Aidan Hogan <aidhog@gmail.com>
Date: Mon, 15 Jul 2019 19:17:46 -0400
To: semantic-web@w3.org
Cc: Jose Miguel Herrera <jherreram@gmail.com>, Tobias Käfer <tobias.kaefer@kit.edu>
Message-ID: <d2f42bfb-b375-7358-34e6-3f329767c5fc@gmail.com>
Hi all,

We are very pleased to announce the release of the BTC 2019 dataset.

The dataset is the result of crawling RDF data (RDF/XML, Turtle, 
N-Triples) from the Web for around one month (2018/12/12 until 
2019/01/11). The dataset contains 2,155,856,033 quads, collected from 
2,641,253 RDF documents on 394 pay-level domains. Merging the data into 
one RDF graph results in 256,059,356 unique triples (many duplicated 
triples are crawled from different documents on Wikidata). The data 
contain 38,156 unique predicates and instances of 120,037 unique classes.

The data are available here: https://zenodo.org/record/2634588

For more details, you can check out the following paper recently 
accepted for the ISWC 2019 Resources Track:

José-Miguel Herrera, Aidan Hogan and Tobias Käfer. "BTC-2019: The 2019 
Billion Triple Challenge Dataset". In the Proceedings of the 18th 
International Semantic Web Conference (ISWC), Auckland, New Zealand, 
October 26–30, 2019 (Resources track).
- http://aidanhogan.com/docs/btc2019.pdf

The previous release in the series was the BTC 2014 dataset:

http://km.aifb.kit.edu/projects/btc-2014/

We hope you might find the dataset useful!

Best,
José Miguel, Aidan, Tobias
Received on Monday, 15 July 2019 23:18:11 UTC

This archive was generated by hypermail 2.4.0 : Thursday, 24 March 2022 20:42:07 UTC