ANN: Berlin SPARQL Benchmark Version 3 and Benchmarking Results

Hi all,

 

we are happy to announce Version 3 of the Berlin SPARQL Benchmark as well as
the results of a benchmark experiment in which we compared the query, load
and update performance of Virtuoso, Jena TDB, 4store, BigData, and BigOWLIM
using the new benchmark.

 

The Berlin SPARQL Benchmark Version 3 (BSBM V3) defines three different
query mixes that test different capabilities of RDF stores:

 

1.       The Explore query mix test the query performance with simple SPARQL
1.0.

2.       The Explore-and-Update query mix test the read and write
performance using SPARQL 1.0 SELECT queries as well as SPARQL 1.1 Update
queries.

3.       The Business Intelligence query mix consists of complex SPARQL 1.1
queries that rely on aggregation as well as subqueries and each touches
large parts of the test dataset.

 

The BSBM V3 specification is found at

 

http://www4.wiwiss.fu-berlin.de/bizer/BerlinSPARQLBenchmark/spec/20101129/

 

We also conducted a benchmark experiment in which we compared query, load
and update performance of Virtuoso, Jena TDB, 4store, BigData, and BigOWLIM
using the new benchmark.

 

We tested the stores with for 100 million triple and 200 million triple data
sets and ran the Explore as well as the Explore-And-Update query mixes. The
results of this experiment are found at

 

http://www4.wiwiss.fu-berlin.de/bizer/BerlinSPARQLBenchmark/results/V6/index
.html

 

It is interesting to see that:

 

1.       Virtuoso dominates the Explore use case for multiple clients.

2.       BigOwlim also shows good multi-client scaling behavior for the 100M
dataset.

3.       4store is the fastest store for the Explore-And-Update query mix.

4.       BigOwlim is able to load the 200m dataset in under 40 minutes,
which comes near the bulk load times of relational databases like MySQL.

5.       All stores that we have previously tested with BSBM V2 improved
their query performance and load times.

 

We also tried to run the Business Intelligence query mix against the stores.
BigData and 4store currently do not provide all SPARQL features that are
required to run the BI query mix. We thus tried to run the Business
Intelligence query mix only against Virtuoso, TDB and BigOwlim. Doing this,
we ran into several "technical problems" that prevented us from finishing
the tests and from reporting meaningful results. We thus decided to give the
store vendors more time to fix and optimize their stores and will run the BI
query mix experiment again in about four months (July 2011).

 

Thanks a lot to Orri Erling for his proposal to have the Business
Intelligence use case and initial queries for the query mix. Lots of thanks
also go to Ivan Mikhailov for his in-depth review of the Business
Intelligence query mix and for finding several bugs in the queries. We also
want to thank Peter Boncz and Hugh Williams for feedback on the new version
of the BSBM benchmark.

 

We want to thank the store vendors and implementers for helping us to setup
and configure their stores for the experiment. Lots of thanks to Andy
Seaborne, Ivan Mikhailov, Hugh Williams, Zdravko Tashev, Atanas Kiryakov,
Barry Bishop, Bryan Thompson,  Mike Personick and Steve Harris. 

 

The work on the BSBM Benchmark Version 3 is funded by the LOD2 - Creating
Knowledge out of Linked Data project (http://lod2.eu/). 

 

More information about the Berlin SPARQL Benchmark is found at 

 

http://www4.wiwiss.fu-berlin.de/bizer/BerlinSPARQLBenchmark/

 

 

Cheers,

 

Andreas Schultz and Chris Bizer

 

 

--

Prof. Dr. Christian Bizer

Web-based Systems Group

Freie Universität Berlin

+49 30 838 55509

 <http://www.bizer.de> http://www.bizer.de

 <mailto:chris@bizer.de> chris@bizer.de

 

Received on Tuesday, 22 February 2011 15:00:50 UTC