RE: Berlin SPARQL Benchmark V2 - Results for Sesame, Virtuoso, Jena TDB, D2R Server, and MySQL from Orri Erling on 2008-09-19 (semantic-web@w3.org from September 2008)

From: Orri Erling <erling@xs4all.nl>
Date: Fri, 19 Sep 2008 23:12:38 +0200
To: "'Daniel Schwabe'" <dschwabe@inf.puc-rio.br>, "'Chris Bizer'" <chris@bizer.de>
Cc: <semantic-web@w3.org>, <public-lod@w3.org>
Message-Id: <200809192114.m8JLEbt2087684@smtp-vbr3.xs4all.nl>
 

> ...
> It is interesting to see:
>
> ...
> 4. that the fastest RDF store is still 7 times slower than a 
> relational database.
Has has there been any analysis on whether there is a *fundamental* 
reason for such performance difference?
Or is it simply a question of "maturity" - in other words, relational db 
technology has been around for a very long time and is very mature, 
whereas RDF implementations are still quite recent, so this gap will 
surely narrow ...?

Cheers
D





Dan, All

This is a very complex subject.  I will offer some analysis below, but this
I fear will only raise further questions.  This is not the end 
of the road, far from it.



In the BSBM case, we first note that the relational representation, in this
case with Virtuoso, is about 4x more space 
efficient than the triples representation.
This translates to running in memory in cases where the triples
representation would go to disk.  In the time of getting one page from disk,

let's say 5ms, one can get 1000 random om memory.

With Virtuoso 6, the ratio is somewhat more advantageous to triples.  Still,
a relational row store, such as MySQL or Virtuoso or pretty 
much any other RDBMS, except for the recent crop of column stores, of which
we speak later, can store a non-indexed dependent column in the 
space it takes to store the data.  Not everything is a triple and not
everything gets indexed multiple ways, from 2 to 6 indices, as with 
triples.

Let us further note that the BSBM report does not really define steady
state.  This is our (OpenLink) principal  criticism of the process.
The TPC (Transaction Processing Performance Council) benchmarks make it a
point to  eliminate nondeterminism coming 
from OS disk  cache.  For OLTP, we run for half an hour first to see that
the cache is filled and then measure for another half hour.  For 
analytics, the benchmark set is much larger than memory and the run starts
with a full read through of the biggest tables, which eliminates 
any randomness from OS disk caching.  Plus there may be rules for switching
off the power but I would have to check the papers to be sure of 
this.

Now, the BSBM data sets are primarily in memory.  However, if we start with
a cold cache at 100M scale with Virtuoso relational, , the first 
run is 20 times slower than the second run with the identical queries.  If
we shut down the server and redo the identical run , then the 

performance is about the same than the faster run because the OS still
caches the disk pages.
So, specially at larger scales, the BSBM test process simply must  ensure
steady state for whatever rates are reported.  The easiest way to 

do this is to have a warm-up that is scale factor / 10 query mixes and not a
constant of 32 query mixes, like with the reported results
Virtuoso SPARQL to SQL mapping performs slower than Virtuoso SQL primarily
because of the increased complexity of query compilation.  

However, the numbers published may underestimate this difference because of
not running with a cache in steady state.  In other words, there 
is disk latency which penalizes both equally while this disk latency would
have vanished with another 5 to 10 minutes of running.

But let us talk about triples vs. rows.  The BSBM workload typically
retrieves multiple dependent attributes of a single key.  If these 

attributes are all next to each other, as in a relational row store, then we
have a constant time for the extra attribute instead of a log 
of the database size.  This favors RDBMS's.  As  mentioned before, there are
also columnar RDBMS's, specially for analytics workloads.   

These do not have related attributes next to each other but they can play
tricks relying on dense spacing of row numbers, locality of 
reference, compression of homogenous data,  and sparse indices, which are
not as readily applicable to a more unpredicttable RDF workload.  

This is complex and we do not have space to go deeper into this here.  We
have considered these, as we naturally have contemplated making a 
column store adaptation of Virtuoso.  We may yet make one.

Then there is the element of dissimilar semantics of SPARQL and SQL.  Q6,
which is basically a SQL LIKE full table scan, is specially unfair 

to triple stores.  Even in the SQL world, this would be done using a text
index in any halfway serious RDBMS, since they all have a full 
text predicate.  This hurts RDF seriously but inconveniences SQL somewhat
less because of locality of reference and the fact that LIKE has a 

sinppler semantic than regexp.  Further, when the scale factor goes to
infinity, the ratio of q6 over all queries goes 
 to unity.  In other words, the composition of the metric is not scale
independent:  If the scale is large, Q6 is the only thing that 
matters and is furthermore a thing that really shows triples at their
worstt.



This is recognized by the authors and addressed by dropping Q6 out of the
metric in some cases but this is still not entirely satisfactory 
since it confuses the metric.  The TPC experience suggests that a single
benchmark must produce a single metric plus a separate full 
disclosure report.  

The best would be to use a text index with systems that support one and put
them in a separate table and make another table with all systems 
but with a metric with no Q6.  The metrics could be labeled BSBM basic and
BSBM extended or something of the sort, since there already are 

results with and without Q6.  Dropping a text lookup altogether is not per
se good since this is such a universal operation.  Allow using a 
text index when there is one and if there isn't use another metric.  Do not
put full table scans into SPARQL benchmarks.



A lesser item in the same  direction is the use of describe, which is not
commensurate between SPARQL and SQL and not even between SPARQL's. 

  Still, this is not asserious as Q6 because this does not affect the
composition of the metric as scale changes.

Also, query compilation  plays a role in BSBM, specially with RDB to RDF
mapping.  Using parametrized queries would be realistic.  For 
example, even with SPARQL going to triples, we see a 30% cut in execution
times if we use parametrized queries.  Note that parametrized 

queries are not standard  SPPARQL.  The BSBM workload is the prototypical
case of using parametrized queries however, certainly in the SQL 
world.


Making and interpreting benchmarks is really complex.  At large scales, it
is a fair rule of thumb that whatever accesses less data will 
win.  This is the point of column stores and can also work in favor of
triples since triples can be indexed in a columnar fashion.  At 
scales where most  fits in memory, which is the case with the BSBM figures
published, the row store will win because of a single lookup for 
multiple values.  The greater the part of the row accessed by the typical
query, the better the row store will do.  BSBM is row friendly. 

Taking a small scale BSBM run, could we make triples outrun relational?  I
would say no.  Compiling SQL has less choices to make, a shorter 
execution plan, less joins.  At a larger scale, with unpredictable queries,
triples could outrun relational because of uniform indexing and 
more adaptive working set (i.e. columnar access).  

BSBM is clearly a pro-relational use case.  This is part of its point, since
it allows comparing relational mapping and triples, which 

comparison is certainly valuable.  But to get to its full value, this
comparison should have aspects of integration.  But this is a step on the
way.  BSBM is not the last word but it certainly is needed.



Orri








 
















Hi





-- 
Daniel Schwabe
Tel:+55-21-3527 1500 r. 4356
Fax: +55-21-3527 1530
http://www.inf.puc-rio.br/~dschwabe 	Dept. de Informatica, PUC-Rio
R. M. de S. Vicente, 225
Rio de Janeiro, RJ 22453-900, Brasil





-- 
No virus found in this incoming message.
Checked by AVG. 
Version: 7.5.526 / Virus Database: 270.7.0/1679 - Release Date: 9/18/2008
5:03 PM
Received on Friday, 19 September 2008 21:15:34 UTC