RE: sparql performance questions from Seaborne, Andy on 2006-06-20 (public-sparql-dev@w3.org from April to June 2006)

From: Seaborne, Andy <andy.seaborne@hp.com>
Date: Tue, 20 Jun 2006 12:22:44 +0100
To: "Denis Gaertner" <denis_gaertner@gmx.net>, <public-sparql-dev@w3.org>
Message-ID: <DF5E364A470421429AE6DC96979A4F6FAFF6F7@sdcexc04.emea.cpqcorp.net>

-------- Original Message --------
> From: Denis Gaertner <>
> Date: 19 June 2006 19:30
> 
> Hi!
> 
> I got some questions concerning performance optimization in sparql
> queries. Since I couldn't find any good information on the net about
> this(please correct me if this is wrong) I thought this might be the
> right place for it.   

It's a bit early in the SPARQL lifecycle as the spec is in
"implementability" phase (CR in W3C speak).

> 
> So I am wondering how the structure of a complex query affects the
> processing and query time. You can group Graphpatterns to an
> arbitrarily depth and I'd like to know if a structure which is
> equivalent maybe to something like DNF or CNF in feature sets could
> help. So for example : { {?s p "o"}  
> UNION
> {?s p2 "o2"}
> }
> {?s p3 "o3"}
> 
> becomes
> 
> {?s p "o" . ?s p3 "o3"}
> UNION
> {?s p2 "o2" . ?s p3 "o3"}
> 
> Now this put in a large scale means breaking down lots of levels into
> fewer. Is this of advantage or is the rdf graph structure not suitable
> for this kind of transformation? Or is it solely a question of
> implementation?   

It's highly implementation dependent.  Consider a system that can only
indexes triples but in a local store, compared to one that can ship a
conjunctive expression (or more complex) to a conjunction engine (a SQL
join expression) but incurs overhead in doing so.  Or a system that can
dispatch parallel subqueries to a cluster but has lots of spare
processing power (probably better to use the latter form and combine the
branches) to tradeoff query elapsed time against CPU cycles.

> 
> And another thing is about constraints. Is it better to put them into
a
> group or to collect them and put them at the end if possible? So for 
> instance:
> 
> {?s p ?o1 FILTER (?o1...)}
> {?s p2 ?o2 FILTER (?o2...)}
> 
> or
> 
> {?s p ?o1} FILTER (?o1...)
> {?s p2 ?o2} FILTER (?o2...)
> 
> or
> 
> {?s p ?o1}
> {?s p2 ?o2}
> FILTER (?o1... && ?o2...)
> 
> So this is it and since I don't know anything on how these queries are
> processed I am not able to figure that out by myself. 

Style-wise: write as you think is clearest.

Performance-wise: I'd expect most implementations to put the filter in
the best place (it's relatively easy to do although combined with
reordering the triple patterns there two dimensions of change to
consider).

	Andy

> 
> Thanks
> 
> Denis

Received on Tuesday, 20 June 2006 12:14:12 UTC