Re: Value of moving OPTIONAL clauses to the end of a graph pattern?

Hi Bob

Here's the equivalent for dotNetRDF  -
http://www.dotnetrdf.org/content.asp?pageID=SPARQL%20Optimisation - the
optimization strategy is pretty similar to Jena's in many regards.

BigData has a wiki page on theirs -
http://sourceforge.net/apps/mediawiki/bigdata/index.php?title=QueryOptimizat
ion

Most commercial vendors tend not to talk too much about their optimizers
unsurprisingly.  You can find plenty of interesting things in the academic
literature if you have the time to go investigating.  Systems like RDF3X,
Hexastore and Diplodocus are three interesting examples.  HDT-Foq is another
relatively new and interesting one I was reading about just yesterday.  All
of these delve quite deeply into the internals of how they store the data in
on-disk/in-memory data structures and how that enables whatever optimization
tricks they do, so it depends how deep into the weeds you want to get.

One point I would make is that unlike simple triple patterns you generally
can't reorder graph patterns because doing so can change the semantics of
the query.  Engines will happily reorder triple patterns within a single
graph pattern because it makes no difference to the semantics, but it is
typically unsafe to reorder graph patterns because often that would change
the algebra and thus potentially the semantics.  As soon as you get any
non-trivial nesting this is likely to be the case or if you use any kind of
complex graph pattern e.g. MINUS or GRAPH

This doesn't stop engines reordering them internally if they so wish
(provided they don't change the semantics of the query and thus the result)
and most engines will evaluate things in what they consider the optimal
order.  As a query writer you should ideally not be worrying too much about
whether your query is written in the most optimal way and be able to rely on
the engine optimizing appropriately.  In reality this may not always work
but this is no different than writing SQL queries in that regard.

Putting OPTIONAL at the end only really makes sense if the data from that
OPTIONAL is used only to augment your results e.g. get the label that may be
associated with some other node in the graph that the required part of your
query is matching.  If you are using it in any other way it probably belongs
where you put it in your query originally.

Another interesting observation on OPTIONAL is that in the case of an ASK
query most OPTIONALs can be safely ignored and not evaluated at all -
http://answers.semanticweb.com/questions/2999/can-optional-clauses-be-ignore
d-in-most-cases-for-ask-queries

Rob

From:  Bob DuCharme <bob@snee.com>
Date:  Thursday, December 20, 2012 1:41 PM
To:  Chime Ogbuji <chimezie@gmail.com>, <public-sparql-dev@w3.org>
Subject:  Re: Value of moving OPTIONAL clauses to the end of a graph
pattern?
Resent-From:  <public-sparql-dev@w3.org>
Resent-Date:  Thu, 20 Dec 2012 21:42:25 +0000

>     
>  Thanks Chime!
>  
>  That would be typical of an optimization strategy anyway, whether manual or
> automated, right?
>  
>  I see some doc about Jena's query optimization at
> http://jena.apache.org/documentation/query/explain.html . If anyone can point
> me to similar pages for other SPARQL query processors, I'd appreciate it.
>  
>  Thanks,
>  
>  Bob
>  
>  
> On 12/20/2012 4:17 PM, Chime Ogbuji wrote:
>  
>  
>>  
>>  If the implementation evaluates the patterns in the order you provide
>> (rather than determine an optimal evaluation strategy independent of the
>> structure of the given query), I think it does make sense.
>>  
>>  
>> 
>>  
>>  -- 
>>  Chime Ogbuji
>>  Sent with Sparrow <http://www.sparrowmailapp.com>
>> 
>>  
>>  
>>  
>> 
>> On Thursday, December 20, 2012 at 4:10 PM, Bob DuCharme wrote:
>>  
>>>   
>>>  
>>>  
>>> Since OPTIONAL clauses have no chance of reducing the search space for
>>>  
>>> their containing graph pattern, does it make sense as a general rule of
>>>  
>>> thumb to put them after all the ones that do, i.e. after the
>>>  
>>> non-OPTIONAL triple patterns?
>>>  
>>> 
>>>  
>>>  
>>> thanks,
>>>  
>>> 
>>>  
>>>  
>>> Bob
>>>  
>>>  
>>>   
>>  
>>  
>>  
>>  
>  
>  

Received on Thursday, 20 December 2012 22:35:30 UTC