W3C home > Mailing lists > Public > semantic-web@w3.org > February 2006

Re: Is it just me or does this seem incredibly slow?

From: Dave Beckett <dave@dajobe.org>
Date: Sun, 05 Feb 2006 15:14:52 -0800
Message-ID: <43E686EC.8060705@dajobe.org>
To: Garrett Wollman <wollman+semantic-web@bimajority.org>
CC: semantic-web@w3.org

Garrett Wollman wrote:
> In my continuing project to develop a search facility for my photo
> galleries using semweb technology, I've been having great difficulty
> finding a query mechanism that can answer simple queries about a small
> database in a reasonable length of time (i.e., seconds, not
> dekaseconds).  I have a small store of some 27,800 triples, containing
> depiction information about my photo galleries.  I'm trying to compute
> something similar to the following SPARQL query (but with more detail
> about each photo):
> 
> PREFIX foaf: <http://xmlns.com/foaf/0.1/>
> PREFIX photo: <http://www.holygoat.co.uk/owl/2005/05/photo/>
> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
> 
> SELECT DISTINCT ?photo, ?name
> WHERE
> {
>         ?photo rdf:type photo:ImageFile ;
>            foaf:depicts ?b .
>         ?b rdf:type foaf:Person ;
>            foaf:name ?name
> }
> 
> Executing this query on a Redland "hashes" triple store takes at least
> five CPU-minutes (that's the point at which I interrupted it) using
> rdfproc(1).  Strangely, executing it against pre-serialized RDF takes
> only 51 CPU-seconds using roqet(1).  Doing a similar query on a 50%
> faster machine using cwm's "--strings" option takes about the same
> time.
> 
> I see these demo pages on the Web and they don't take that long to
> compute a very similar query on much larger databases.  What are they
> doing that I'm not?

The web demo uses rasqal 0.9.11 with redland on a memory based store
(with no indexing) so it's unlikely to be that.  5 minute queries
usually means something went wrong, and as you don't give the full
query, I'm not clear what it could be.

One possibility is - and redland/rasqal doesn't test this yet - is that
the triple patterns of the query don't connect up (are two separate
graphs), so it scans the entire store multiple times in an attempt to
get the answer.  [If it was SQL it would be a join where none of the
variables are shared between tables]

Also DISTINCT had bug fixes and improvements in rasqal 0.9.11 so I
assume you are using that.

If you want to give more info, please send the full query & data and/or
use the issue tracker at http://bugs.librdf.org/

Dave
Received on Sunday, 5 February 2006 23:15:03 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 1 March 2016 07:41:49 UTC