Re: SPARQL performance for ORDER BY on large datasets

Hello,

it depends on the query processor and the sort algorithm used. The main 
bottleneck is probably memory causing the operating system to swap to 
disk and start trashing and as a consequence becoming even slower.

Jena ARQ uses Java's Arrays.sort() [1,2]: "The sorting algorithm is a 
modified mergesort [...] This algorithm offers guaranteed n*log(n) 
performance." - For really large datasets a sort will always be expensive.

Side note: it's the same with RDBMS - just do an SQL ORDER BY over an 
attribute which is not indexed!

Regards AndyL

[1] 
http://jena.svn.sourceforge.net/viewvc/jena/ARQ/trunk/src/com/hp/hpl/jena/sparql/engine/iterator/QueryIterSort.java?view=markup

[2] 
http://java.sun.com/j2se/1.5.0/docs/api/java/util/Arrays.html#sort(T[],%20java.util.Comparator)

Brian Stamper schrieb:
> 
> A curious student-coder-in-training asks:
> 
> How does the number of triples correlate to the processing time of ORDER 
> BY; e.g. is it logarithmic, quadratic, exponential, etc.? Is this an 
> issue of the sort algorithm, or the structure of the data, or..?
> 
> Brian Stamper
> 
> 

Received on Tuesday, 1 September 2009 21:20:51 UTC