Re: SPARQL performance for ORDER BY on large datasets

On Aug 28, 2009, at 12:31 PM, Seaborne, Andy wrote:
> And the ORDER BY?

at the moment the order by will be applied additionally at ARQ level  
even when there's an ordered SQL index - sure it would work by moving  
down OpOrder and merge into the SQL query where appropriate but the  
problem is that not all SPARQL queries can be translated into a single  
SQL query and it will work only in some special situations

it would work if:

a) the variable refers to a single DB attribute (i.e. there is only  
one property bridge defined for a specific RDF predicate)
b) there is an index on the corresponding DB attribute
c) one of these cases applies:
    i)  the SPARQL query can be translated directly into a single SQL  
query (e.g. { ?s :p1 ?o } ORDER BY ?o)
    ii) multiple SQL queries are required (they are combined by  
OpUnion), but the order var applies only to one of the SQL queries

Richard & Christian, if I've finished here and there is some time left  
in autumn I can stick to this and also the left-join optimizations

Regards
Andy



>
> It will depend on how exactly SPARQL semantics are needed.  SQL  
> databases don't have exactly the same notion of ordering (they  
> expect type columns, which RDF does not guarantee; also, it's XSD  
> dateTimes although here because all the timezones are the same  
> -04:00 it's easier).
>
>>
>> The data Niklas wants to sparql-query are native RDF. Thus, in order
>> to use D2R over RDBMS with indexes, it would require him to transform
>> all data back into SQL tables, how evil... ;-)
>>
>> The idea was, whould it be possible to define partial indexes for
>> native RDF stores such as TDB?
>>
>> s   p   o
>> ------------
>>     :p       ^
>>     :p       |
>>     :p       | index over :p
>>     :p       v
>>     :q       ^
>>     :q       | index over :q (and same object ranges, e.g.
>> xsd:dateTime)
>>     :q       v
>>     ...
>>
>> regards
>> AndyL
>
> In the particular case of xsd:dateTimes, that comes for free  
> (nearly).  The indexes store the binary value of a dateTime so the  
> index to cover :p is just a restriction view on the index for SPO.   
> What TDB does not do is track which variable comes from where and  
> which sort order it will naturally come out in.  if it did, and it  
> aligns with the ORDER BY, then no real sort if needed.
>
> Going one step further is what Sampo suggested by doing the query  
> "backwards".  Find a sorted sequence of xsd:dateTimes, and for each,  
> attempt the pattern.
>
> 	Andy
>


http://www.langegger.at
----------------------------------------------------------------------
Dipl.-Ing.(FH) Andreas Langegger
FAW - Institute for Application-oriented Knowledge Processing
Johannes Kepler University Linz
A-4040 Linz, Altenberger Straße 69

Received on Friday, 28 August 2009 11:14:34 UTC