Re: rdf.rb/spira bulk read question

You can improve this by not querying for each property explicitly. As Ben mentioned, RDF::Query#execute works by running a set of patterns against a queryable object (e.g., a Repository), creating solutions with variables bound to specific values. By chaining these patterns, you can first select for a subject having rdf:type <class-uri>, and using that query for all predicates and objects. This is essentially what happens in a SPARQL BGP query, such as the following:

SELECT *
 WHERE {
?s rdf:type <class-uri>
?s ?p ?o
}

Using RDF::Query, this can be done as follows:

RDF::Query.new { pattern [:s, RDF.type, <class-uri>]; pattern [:s, :p, :o] }

or, using the hash syntax

RDF::Query.new( {:s => { RDF.type => <class-uri>, :p => :o }})

This should return all predicates and objects for each subject having a specific rdf:type. It doesn't place each object into it's own variable, but it does reduce the number of operations considerably.

Gregg

On Mar 2, 2011, at 7:01 AM, Greg Lappen wrote:

Actually, after further testing, I can say that even graph queries do a query against the repository for each instance found.  For example, if I execute this query:

query = RDF::Query.new(
  {:record=>
    {RDF.type => <class uri>,
     <property1_uri> => :property1,
     <property2_uri> => :property2}})

I see it does one query against the repo to get all statements with predicate of <property1_uri>, then for each of those statements, it does another query to get the  statements with predicate <property2_uri> that have the same subject.  So again, hundreds of queries....am I missing something, or is there really no way to get all properties for all instances of a type out of the repository efficiently?

Greg

---------- Forwarded message ----------
From: Greg Lappen <greg@lapcominc.com<mailto:greg@lapcominc.com>>
Date: Wed, Mar 2, 2011 at 9:20 AM
Subject: rdf.rb/spira bulk read question
To: public-rdf-ruby@w3.org<mailto:public-rdf-ruby@w3.org>


Hi all,

We are making good progress with our project, and I've gotten to the point where I am storing datasets in our rdf repository (rdf.rb based, implemented on couchdb).  Now I'm building a page that allows the data to be exported in various formats (xml, csv, etc), but when I iterate over all of the data, it is extremely slow.  I see Spira querying the repository once for each instance when I iterate using the model's "each" method.  I understand why, I'm just wondering if there's a faster way to query all of the instances of a Spira class.  One thought we had was to use a graph query instead, which would pull out all the properties in N queries (where N is the number of properties in the class).  In the example I'm trying, this would be 23 queries, which is better than hundreds or thousands of queries. Is this as good as it gets?  I'm accustomed to working with RDBMS and ActiveRecord, so I may just have to shift my expectations a bit, but thought I would ask the group if there's something I'm missing....thanks as always,

Greg

Received on Wednesday, 2 March 2011 17:37:01 UTC