Re: SPARQL performance for ORDER BY on large datasets from Niklas Lindström on 2009-09-01 (semantic-web@w3.org from September 2009)

From: Niklas Lindström <lindstream@gmail.com>
Date: Tue, 1 Sep 2009 20:47:54 +0200
To: Paul Gearon <gearon@ieee.org>, carmen <_@whats-your.name>, Peter Ansell <ansell.peter@gmail.com>, Sampo Syreeni <decoy@iki.fi>, tself@bbn.com, bnowack@semsol.com, Andreas Langegger <al@jku.at>, "Seaborne, Andy" <andy.seaborne@hp.com>, Bernhard Schandl <bernhard.schandl@univie.ac.at>
Cc: Semantic Web <semantic-web@w3.org>
Message-ID: <cf8107640909011147r43a0cb1fn32625fecf2933d82@mail.gmail.com>

Hi all,

Thanks for all your responses! I will have to follow the development
of property indexing for Jena, ARC, Mulgara, Parliament and more. It
seems this case isn't trivial for any triplestore right now though, so
I'll think about doing some kind of instrumental indexing for these
cases.

MongoDB recently reached 1.0 and seems very suited for this kind of
problem. While it's not as powerful as something SPARQL-enabled, it
sure beats hard-coding the model in SQL, lower my RDF into it and then
lifting it with e.g. D2R. (And I can see some interesting future
possibilities in using such a DB even as a backend of something
RDF-aware. We'll see.)

Still, I hope you post here *any* news which may shed light on solving
the problem of sorting SELECT results on huge datasets with SPARQL. It
seems to me that this is a bottleneck in promoting triplestores as a
foundation for general service creation..?

[Note 1: I did try with something more cumbersome like FILTERing on
date ranges; but that's only feasible when there is a very even
distribution of temporal values, which can seldom be relied on. And
notably this didn't generally perform well enough to be of use for
services either (when also matching on type).]

[Note 2: The machine I use is a MacBook Pro, 2.8 GHz Intel Core 2 Duo
with 4Gb of RAM (OS X 10.5.8).]

Best regards,
Niklas

Received on Tuesday, 1 September 2009 18:48:54 UTC