Re: Fwd: ANN: New Berlin SPARQL Benchmark results from Kingsley Idehen on 2013-04-29 (public-lod@w3.org from April 2013)

From: Kingsley Idehen <kidehen@openlinksw.com>
Date: Mon, 29 Apr 2013 11:37:26 -0400
To: lotico-list@googlegroups.com, "public-lod@w3.org" <public-lod@w3.org>
Message-ID: <517E93B6.1060605@openlinksw.com>
On 4/29/13 8:05 AM, Marco Neumann wrote:
> some interesting numbers here. I am sure Kingsley is going to be
> delighted with some of the findings :)

Yes, he's had to keep quiet about these results and findings for too 
loooong :-)

Key take away: you can have open ended SPARQL endpoints that scale, 
massively.

Our mission re., Virtuoso 7.0 was simple: put the aforementioned 
misconception to rest. Again, you can build and deploy massively 
scalable RDF based Linked Data solutions that go where Hadoop, NewSQL, 
NoSQL, and even conventional RDBMS engines can't venture.

What the industry is still struggling to grasp about RDF based Linked 
Data is the fact that performance, scale, integration, and access 
controls aren't unique re.,  key hurdles. These problems still hound 
conventional RDBMS, NewSQL, and NoSQL products which do not attend to 
the "integration" (open data connectivity) and "access controls" issues. 
For instance, they simply don't have URIs as native data types which (by 
implication) makes every data object (RDF resource) accessible to user 
agents on a public or private HTTP network.

Linked Data HTTP URIs are extremely powerful Super Keys for 
heterogeneous data virtualization, integration, and management. It's 
these core features that will constructively tweak the DBMS world as we 
all used to know it :-)


Kingsley
>
> Marco
>
>
> ---------- Forwarded message ----------
> From: Christian Bizer <chris@bizer.de>
> Date: Mon, Apr 29, 2013 at 7:54 AM
> Subject: ANN: New Berlin SPARQL Benchmark results for datasets ranging
> from  10 million to 150 billion RDF triples
> To: public-lod@w3.org, semantic-web@w3.org, public-sparql-dev@w3.org
>
>
> Hi all,
>
> Berlin SPARQL Benchmark (BSBM) is a benchmark for measuring the
> performance of storage systems that expose SPARQL endpoints. The
> benchmark is built around an e-commerce use case in which a set of
> products is offered by different vendors.The benchmark defines two
> query mixes:
> 1. The query mix of theExplore use case
> <http://wifo5-03.informatik.uni-mannheim.de/bizer/berlinsparqlbenchmark/spec/ExploreUseCase/index.html>illustrates
> the search and navigation pattern of a consumer looking for a product
> via some web portal.
> 2. The query mix of theBusiness Intelligence use case
> <http://wifo5-03.informatik.uni-mannheim.de/bizer/berlinsparqlbenchmark/spec/BusinessIntelligenceUseCase/index.html>simulates
> different stakeholders asking analytical questions against the
> dataset. The query mix relies heavily on SPARQL 1.1 constructs like
> GROUP BY and COUNT() and is designed to touch large portions of the
> benchmark dataset.
>
> I'm happy to announce the results of a new BSBM benchmark experiment.
> The experiment compares the performance of
>
> 1. BigData
> 2. BigOwlim
> 3. Jena TDB
> 4. Virtuoso
>
> on a single machine using datasets ranging from 10 million to 1
> billion RDF triples (Explore and Business Intelligence query mixes).
>
> In addition, it compares the performance of
>
> 1. BigOwlim
> 2. Virtuoso
>
> on a cluster of 8 machines using datasets ranging from 10 billion to
> 150 billion RDF triples (Explore and Business Intelligence query
> mixes).
>
> The results of the experiment are found at
>
> http://wifo5-03.informatik.uni-mannheim.de/bizer/berlinsparqlbenchmark/results/V7/
>
> I think that the results are quite impressive and demonstrate that
> SPARQL stores got a lot more mature over the last years.
>
> A year ago, many RDF stores still had problems with the SPARQL 1.1
> constructs GROUP BY and COUNT() and were thus not able to execute the
> Business Intelligence query mix. Now, all systems pass this test and
> some of the systems show an impressive performance on grouping and
> aggregating the data.
>
> The 150 billion triples experiment has shown that given proper
> hardware, it is possible to run analytical queries on amounts of data
> that are beyond most (all?) of today's use cases: The whole LOD Cloud
> [1] is estimated to consist only of 31 billion triples; the RDFa,
> Microdata and Microformat dataset extracted by the WebDataCommons [2]
> project from 3 billion HTML pages only consists of 7.3 billion
> triples. So, 150 billion triples leave quite some room for the further
> growth of structured data on the Web ;-)
>
> More information about the Berlin SPARQL benchmark, the exact
> specification of the benchmark query mixes, as well as results from
> previous benchmarking experiments are found at
>
> http://wifo5-03.informatik.uni-mannheim.de/bizer/berlinsparqlbenchmark/
>
> Lots of thanks to Peter Boncz  and Minh-Duc Pham who conducted the new
> experiment as part of the EU project LOD2 and have provided their
> results for being published on the BSBM website.
>
> Cheers,
>
> Chris
>
> [1] http://lod-cloud.net/state/
> [2] http://www.webdatacommons.org/
>
>
>
>
> --
>
>
> ---
> Marco Neumann
> KONA
>


-- 

Regards,

Kingsley Idehen	
Founder & CEO
OpenLink Software
Company Web: http://www.openlinksw.com
Personal Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca handle: @kidehen
Google+ Profile: https://plus.google.com/112399767740508618350/about
LinkedIn Profile: http://www.linkedin.com/in/kidehen
Attachments

application/pkcs7-signature attachment: S/MIME Cryptographic Signature
Received on Monday, 29 April 2013 15:37:50 UTC