RE: SPARQL performance for ORDER BY on large datasets



> -----Original Message-----
> From: Andreas Langegger [mailto:al@jku.at]
> Sent: 27 August 2009 10:40
> To: Seaborne, Andy
> Cc: Semantic Web
> Subject: Re: SPARQL performance for ORDER BY on large datasets
> 
> yes, since D2R pushes down filters now it will use available indexes
> at the RDB level.

And the ORDER BY?

It will depend on how exactly SPARQL semantics are needed.  SQL databases don't have exactly the same notion of ordering (they expect type columns, which RDF does not guarantee; also, it's XSD dateTimes although here because all the timezones are the same -04:00 it's easier).

> 
> The data Niklas wants to sparql-query are native RDF. Thus, in order
> to use D2R over RDBMS with indexes, it would require him to transform
> all data back into SQL tables, how evil... ;-)
> 
> The idea was, whould it be possible to define partial indexes for
> native RDF stores such as TDB?
> 
> s   p   o
> ------------
>      :p       ^
>      :p       |
>      :p       | index over :p
>      :p       v
>      :q       ^
>      :q       | index over :q (and same object ranges, e.g.
> xsd:dateTime)
>      :q       v
>      ...
> 
> regards
> AndyL

In the particular case of xsd:dateTimes, that comes for free (nearly).  The indexes store the binary value of a dateTime so the index to cover :p is just a restriction view on the index for SPO.  What TDB does not do is track which variable comes from where and which sort order it will naturally come out in.  if it did, and it aligns with the ORDER BY, then no real sort if needed.

Going one step further is what Sampo suggested by doing the query "backwards".  Find a sorted sequence of xsd:dateTimes, and for each, attempt the pattern.

 Andy

Received on Friday, 28 August 2009 10:32:45 UTC