Re: ANN: New Berlin SPARQL Benchmark results for datasets ranging from 10 million to 150 billion RDF triples

Chris,

Is there a detailed analysis available? I'm guessing that the whole team is
submitting a paper to a conference.

Regards,

Juan

On Monday, April 29, 2013, Christian Bizer wrote:

> Hi all,
>
> Berlin SPARQL Benchmark (BSBM) is a benchmark for measuring the
> performance of storage systems that expose SPARQL endpoints. The benchmark
> is built around an e-commerce use case in which a set of products is
> offered by different vendors.The benchmark defines two query mixes:
> 1. The query mix of theExplore use case <http://wifo5-03.informatik.**
> uni-mannheim.de/bizer/**berlinsparqlbenchmark/spec/**
> ExploreUseCase/index.html<http://wifo5-03.informatik.uni-mannheim.de/bizer/berlinsparqlbenchmark/spec/ExploreUseCase/index.html>
> >**illustrates the search and navigation pattern of a consumer looking
> for a product via some web portal.
> 2. The query mix of theBusiness Intelligence use case <
> http://wifo5-03.informatik.**uni-mannheim.de/bizer/**
> berlinsparqlbenchmark/spec/**BusinessIntelligenceUseCase/**index.html<http://wifo5-03.informatik.uni-mannheim.de/bizer/berlinsparqlbenchmark/spec/BusinessIntelligenceUseCase/index.html>>simulates
> different stakeholders asking analytical questions against the dataset. The
> query mix relies heavily on SPARQL 1.1 constructs like GROUP BY and COUNT()
> and is designed to touch large portions of the benchmark dataset.
>
> I'm happy to announce the results of a new BSBM benchmark experiment.  The
> experiment compares the performance of
>
> 1. BigData
> 2. BigOwlim
> 3. Jena TDB
> 4. Virtuoso
>
> on a single machine using datasets ranging from 10 million to 1 billion
> RDF triples (Explore and Business Intelligence query mixes).
>
> In addition, it compares the performance of
>
> 1. BigOwlim
> 2. Virtuoso
>
> on a cluster of 8 machines using datasets ranging from 10 billion to 150
> billion RDF triples (Explore and Business Intelligence query mixes).
>
> The results of the experiment are found at
>
> http://wifo5-03.informatik.**uni-mannheim.de/bizer/**
> berlinsparqlbenchmark/results/**V7/<http://wifo5-03.informatik.uni-mannheim.de/bizer/berlinsparqlbenchmark/results/V7/>
>
> I think that the results are quite impressive and demonstrate that SPARQL
> stores got a lot more mature over the last years.
>
> A year ago, many RDF stores still had problems with the SPARQL 1.1
> constructs GROUP BY and COUNT() and were thus not able to execute the
> Business Intelligence query mix. Now, all systems pass this test and some
> of the systems show an impressive performance on grouping and aggregating
> the data.
>
> The 150 billion triples experiment has shown that given proper hardware,
> it is possible to run analytical queries on amounts of data that are beyond
> most (all?) of today's use cases: The whole LOD Cloud [1] is estimated to
> consist only of 31 billion triples; the RDFa, Microdata and Microformat
> dataset extracted by the WebDataCommons [2] project from 3 billion HTML
> pages only consists of 7.3 billion triples. So, 150 billion triples leave
> quite some room for the further growth of structured data on the Web ;-)
>
> More information about the Berlin SPARQL benchmark, the exact
> specification of the benchmark query mixes, as well as results from
> previous benchmarking experiments are found at
>
> http://wifo5-03.informatik.**uni-mannheim.de/bizer/**
> berlinsparqlbenchmark/<http://wifo5-03.informatik.uni-mannheim.de/bizer/berlinsparqlbenchmark/>
>
> Lots of thanks to Peter Boncz  and Minh-Duc Pham who conducted the new
> experiment as part of the EU project LOD2 and have provided their results
> for being published on the BSBM website.
>
> Cheers,
>
> Chris
>
> [1] http://lod-cloud.net/state/
> [2] http://www.webdatacommons.org/
>
>
>

-- 
Juan Sequeda
+1-575-SEQ-UEDA
www.juansequeda.com

Received on Monday, 29 April 2013 21:31:07 UTC