- From: Joshua Tauberer <jt@occams.info>
- Date: Sun, 10 Aug 2008 08:58:38 -0400
- To: public-sparql-dev@w3.org, semantic-web@w3.org, public-lod@w3.org, semweb-dotnet@yahoogroups.com
Chris Bizer wrote: > SPARQL query language and the SPARQL protocol are implemented by a > growing number of storage systems and are used within enterprise and > open web settings. As SPARQL is taken up by the community there is a > growing need for benchmarks to compare the performance of storage > systems that expose SPARQL endpoints via the SPARQL protocol. > > We have been working over the last week on such a benchmark called the > Berlin SPARQL Benchmark (BSBM). I ran the benchmark against my SemWeb .NET library [1] (whose SPARQL engine is a fork of the work of Ryan Levering's GSoC project a few years back). Instructions for setting up the benchmark are here [2] (and turned out to be a good example for how to set up a SPARQL endpoint using the library, backed with your SQL database of choice (in this case MySQL).) For full disclosure, I had to correct a few bugs in the library before all of the queries in the benchmark ran through OK. These are listed at [2]. Also I have some concerns. First, I am not 100% sure if the results of my library are actually correct. Query 4 seemed to always return no results. Second, queries are largely translated into SQL, and there is a good deal of caching going on at the level of MySQL. The benchmark results then are saying a lot about the best-case run time, and indicate something about the overhead of SPARQL processing, but may not indicate general use performance. Benchmark results reported below are for my desktop: Intel Core2 Duo at 3.00GHz, 2 GB RAM, 32bit Ubuntu 8.04 on Linux 2.6.24-19-generic, Java 1.6.0_06 for the benchmark tools, and Mono 1.9.1. This seems roughly comparable to the machine used in the BSBM. Load time (in seconds and triples/sec) is reported below for two of the data set sizes. 1M 25M Time (sec) 224 16129 triples/sec 4441 1544 For comparison, load time for the 1M data set was 224 seconds. This is about double-to-2.5 times (worse) the time of Jena SDB (Hash) with MySQL over Joseki3 (117s) and Virtuoso Open-Source Edition v5.0.6 and v5.0.7 (87s), as reported in the BSBM results. For the larger 25M dataset, the load time at 4.5 hours was only 1.2 times slower than Jena SDB but 1.7 times faster than Sesame over Tomcat. (But, again, the machines were different.) Results for query execution are reported below. AQET (Average Query Execution Time, in seconds) is reported below for each of the queries for different data set sizes. The results were roughly comparable again to Jena and Virtuoso. But, again, the three caveats above are worth restating: the query results are not validated to be known to be correct, there is significant caching, and the machine was different than the machine used in BSBM. 1M 25M Query 1 0.019184 0.049200 Query 2 0.051187 0.048590 Query 3 0.030508 0.079187 Query 4 0.032693 0.075603 Query 5 0.172283 0.342828 Query 6 0.102105 3.277656 Query 7 0.256491 1.108414 Query 8 0.175357 0.572258 Query 9 0.059674 0.088451 Query 10 0.089215 0.322246 [1] http://razor.occams.info/code/semweb [2] http://razor.occams.info/code/semweb/semweb-current/doc/bsbm.html -- - Josh Tauberer http://razor.occams.info "Yields falsehood when preceded by its quotation! Yields falsehood when preceded by its quotation!" Achilles to Tortoise (in "Godel, Escher, Bach" by Douglas Hofstadter)
Received on Sunday, 10 August 2008 12:59:22 UTC