Re: rdf.rb/spira bulk read question from Ben Lavender on 2011-03-02 (public-rdf-ruby@w3.org from March 2011)

From: Ben Lavender <blavender@gmail.com>
Date: Wed, 2 Mar 2011 09:25:00 -0600
To: Greg Lappen <greg@lapcominc.com>
Cc: public-rdf-ruby@w3.org
Message-ID: <AANLkTikyEv_CuJKQU6BA4KrROA3CTNH8-11SEdfJDvad@mail.gmail.com>

For the first question, RDF.rb is based on iterators. The default
implementation of query_pattern, (see RDF::Queryable) will iterate
over each_statement and return results for each potential result. It's
painful. Have you re-implemented this for your backend?

For the second question, you'd need two things to get what you want
here. First, Spira would need a significant amount of work to take
advantage of BGPs. It was written before they were available. Second,
your repository implementation would need to implement them
effectively, and RDF.rb's implementation is a pretty straightforward
application of the BGP algorithm, which involves a lot of iterating
over intermediate results, which, in your case, will involve a lot of
queries. There's no real way around this in the general case;
efficient BGP execution would be dependent on the actual backend.

Solving that generally, you'd end up re-implementing some of the hard
parts of SPARQL; you'd have to re-implement RDF::Query#execute to be
specific to your repository's logic. Is there a reason you are not
using one of the many available RDF stores with SPARQL support?

Ben

On Wed, Mar 2, 2011 at 9:01 AM, Greg Lappen <greg@lapcominc.com> wrote:
> Actually, after further testing, I can say that even graph queries do a
> query against the repository for each instance found.  For example, if I
> execute this query:
> query = RDF::Query.new(
>   {:record=>
>     {RDF.type => <class uri>,
>      <property1_uri> => :property1,
>      <property2_uri> => :property2}})
> I see it does one query against the repo to get all statements with
> predicate of <property1_uri>, then for each of those statements, it does
> another query to get the  statements with predicate <property2_uri> that
> have the same subject.  So again, hundreds of queries....am I missing
> something, or is there really no way to get all properties for all instances
> of a type out of the repository efficiently?
> Greg
>
> ---------- Forwarded message ----------
> From: Greg Lappen <greg@lapcominc.com>
> Date: Wed, Mar 2, 2011 at 9:20 AM
> Subject: rdf.rb/spira bulk read question
> To: public-rdf-ruby@w3.org
>
>
> Hi all,
> We are making good progress with our project, and I've gotten to the point
> where I am storing datasets in our rdf repository (rdf.rb based, implemented
> on couchdb).  Now I'm building a page that allows the data to be exported in
> various formats (xml, csv, etc), but when I iterate over all of the data, it
> is extremely slow.  I see Spira querying the repository once for each
> instance when I iterate using the model's "each" method.  I understand why,
> I'm just wondering if there's a faster way to query all of the instances of
> a Spira class.  One thought we had was to use a graph query instead, which
> would pull out all the properties in N queries (where N is the number of
> properties in the class).  In the example I'm trying, this would be 23
> queries, which is better than hundreds or thousands of queries. Is this as
> good as it gets?  I'm accustomed to working with RDBMS and ActiveRecord, so
> I may just have to shift my expectations a bit, but thought I would ask the
> group if there's something I'm missing....thanks as always,
> Greg
>

Received on Wednesday, 2 March 2011 15:25:52 UTC