- From: Joshua Tauberer <jt@occams.info>
- Date: Sun, 10 Aug 2008 08:58:38 -0400
- To: public-sparql-dev@w3.org, semantic-web@w3.org, public-lod@w3.org, semweb-dotnet@yahoogroups.com
Chris Bizer wrote:
> SPARQL query language and the SPARQL protocol are implemented by a
> growing number of storage systems and are used within enterprise and
> open web settings. As SPARQL is taken up by the community there is a
> growing need for benchmarks to compare the performance of storage
> systems that expose SPARQL endpoints via the SPARQL protocol.
>
> We have been working over the last week on such a benchmark called the
> Berlin SPARQL Benchmark (BSBM).
I ran the benchmark against my SemWeb .NET library [1] (whose SPARQL
engine is a fork of the work of Ryan Levering's GSoC project a few years
back). Instructions for setting up the benchmark are here [2] (and
turned out to be a good example for how to set up a SPARQL endpoint
using the library, backed with your SQL database of choice (in this case
MySQL).)
For full disclosure, I had to correct a few bugs in the library before
all of the queries in the benchmark ran through OK. These are listed at [2].
Also I have some concerns. First, I am not 100% sure if the results of
my library are actually correct. Query 4 seemed to always return no
results. Second, queries are largely translated into SQL, and there is a
good deal of caching going on at the level of MySQL. The benchmark
results then are saying a lot about the best-case run time, and indicate
something about the overhead of SPARQL processing, but may not indicate
general use performance.
Benchmark results reported below are for my desktop: Intel Core2 Duo at
3.00GHz, 2 GB RAM, 32bit Ubuntu 8.04 on Linux 2.6.24-19-generic, Java
1.6.0_06 for the benchmark tools, and Mono 1.9.1. This seems roughly
comparable to the machine used in the BSBM.
Load time (in seconds and triples/sec) is reported below for two of the
data set sizes.
1M 25M
Time (sec) 224 16129
triples/sec 4441 1544
For comparison, load time for the 1M data set was 224 seconds. This is
about double-to-2.5 times (worse) the time of Jena SDB (Hash) with MySQL
over Joseki3 (117s) and Virtuoso Open-Source Edition v5.0.6 and v5.0.7
(87s), as reported in the BSBM results. For the larger 25M dataset, the
load time at 4.5 hours was only 1.2 times slower than Jena SDB but 1.7
times faster than Sesame over Tomcat. (But, again, the machines were
different.)
Results for query execution are reported below. AQET (Average Query
Execution Time, in seconds) is reported below for each of the queries
for different data set sizes. The results were roughly comparable again
to Jena and Virtuoso. But, again, the three caveats above are worth
restating: the query results are not validated to be known to be
correct, there is significant caching, and the machine was different
than the machine used in BSBM.
1M 25M
Query 1 0.019184 0.049200
Query 2 0.051187 0.048590
Query 3 0.030508 0.079187
Query 4 0.032693 0.075603
Query 5 0.172283 0.342828
Query 6 0.102105 3.277656
Query 7 0.256491 1.108414
Query 8 0.175357 0.572258
Query 9 0.059674 0.088451
Query 10 0.089215 0.322246
[1] http://razor.occams.info/code/semweb
[2] http://razor.occams.info/code/semweb/semweb-current/doc/bsbm.html
--
- Josh Tauberer
http://razor.occams.info
"Yields falsehood when preceded by its quotation! Yields
falsehood when preceded by its quotation!" Achilles to
Tortoise (in "Godel, Escher, Bach" by Douglas Hofstadter)
Received on Sunday, 10 August 2008 12:59:26 UTC