- From: Christian Bizer <chris@bizer.de>
- Date: Mon, 29 Apr 2013 13:54:47 +0200
- To: public-lod@w3.org, semantic-web@w3.org, public-sparql-dev@w3.org
Hi all, Berlin SPARQL Benchmark (BSBM) is a benchmark for measuring the performance of storage systems that expose SPARQL endpoints. The benchmark is built around an e-commerce use case in which a set of products is offered by different vendors.The benchmark defines two query mixes: 1. The query mix of theExplore use case <http://wifo5-03.informatik.uni-mannheim.de/bizer/berlinsparqlbenchmark/spec/ExploreUseCase/index.html>illustrates the search and navigation pattern of a consumer looking for a product via some web portal. 2. The query mix of theBusiness Intelligence use case <http://wifo5-03.informatik.uni-mannheim.de/bizer/berlinsparqlbenchmark/spec/BusinessIntelligenceUseCase/index.html>simulates different stakeholders asking analytical questions against the dataset. The query mix relies heavily on SPARQL 1.1 constructs like GROUP BY and COUNT() and is designed to touch large portions of the benchmark dataset. I'm happy to announce the results of a new BSBM benchmark experiment. The experiment compares the performance of 1. BigData 2. BigOwlim 3. Jena TDB 4. Virtuoso on a single machine using datasets ranging from 10 million to 1 billion RDF triples (Explore and Business Intelligence query mixes). In addition, it compares the performance of 1. BigOwlim 2. Virtuoso on a cluster of 8 machines using datasets ranging from 10 billion to 150 billion RDF triples (Explore and Business Intelligence query mixes). The results of the experiment are found at http://wifo5-03.informatik.uni-mannheim.de/bizer/berlinsparqlbenchmark/results/V7/ I think that the results are quite impressive and demonstrate that SPARQL stores got a lot more mature over the last years. A year ago, many RDF stores still had problems with the SPARQL 1.1 constructs GROUP BY and COUNT() and were thus not able to execute the Business Intelligence query mix. Now, all systems pass this test and some of the systems show an impressive performance on grouping and aggregating the data. The 150 billion triples experiment has shown that given proper hardware, it is possible to run analytical queries on amounts of data that are beyond most (all?) of today's use cases: The whole LOD Cloud [1] is estimated to consist only of 31 billion triples; the RDFa, Microdata and Microformat dataset extracted by the WebDataCommons [2] project from 3 billion HTML pages only consists of 7.3 billion triples. So, 150 billion triples leave quite some room for the further growth of structured data on the Web ;-) More information about the Berlin SPARQL benchmark, the exact specification of the benchmark query mixes, as well as results from previous benchmarking experiments are found at http://wifo5-03.informatik.uni-mannheim.de/bizer/berlinsparqlbenchmark/ Lots of thanks to Peter Boncz and Minh-Duc Pham who conducted the new experiment as part of the EU project LOD2 and have provided their results for being published on the BSBM website. Cheers, Chris [1] http://lod-cloud.net/state/ [2] http://www.webdatacommons.org/
Received on Monday, 29 April 2013 11:55:07 UTC