Re: rdf.rb/spira bulk read question from Gregg Kellogg on 2011-03-02 (public-rdf-ruby@w3.org from March 2011)

From: Gregg Kellogg <gregg@kellogg-assoc.com>
Date: Wed, 2 Mar 2011 11:07:39 -0500
To: Ben Lavender <blavender@gmail.com>
CC: Greg Lappen <greg@lapcominc.com>, Gabor Ratky <gabor@secretsaucepartners.com>, "public-rdf-ruby@w3.org" <public-rdf-ruby@w3.org>
Message-ID: <D42D062C-85BC-4C43-A2E0-2DD70D9CA4AA@kellogg-assoc.com>

BTW, RDF::Query only uses the repo query_pattern for the first filter. After that it uses the Queryable implementation, as it's operating on a solution set. Having the Repo do the entire execute is the only way I can see to move the work into the repo.

Note that we're getting fairly close to a complete SPARQL 1.0 implementation for RDF.rb, which depends extensively on the performance of Query#execute.

Gregg Kellogg

Sent from my iPad

On Mar 2, 2011, at 7:51 AM, "Ben Lavender" <blavender@gmail.com> wrote:

> Currently, that is a correct understanding. I think we'd be willing to
> accept a patch that checks if the given queryable has its own
> implementation of Query#execute and uses that if found, and the
> default if not. That should maybe even be the default, making
> Query#execute call out to some method on Queryable that holds the
> current BGP logic, which implementations can overwrite.
> 
> OTOH most implementations won't be able to do anything much more
> effectively than the default algorithm. It is what it is.
> 
> If replication is your main goal, I'd suggest that several stores,
> i.e. Sesame, can quite effectively use MySQL as a backend and you
> could use that replication.
> 
> Ben
> 
> On Wed, Mar 2, 2011 at 9:37 AM, Greg Lappen <greg@lapcominc.com> wrote:
>> Yes, not only am I using ipublic/rdf-couchdb, I WROTE it!  I'm pleasantly
>> surprised to find that someone else has tried to use it, ha!
>> I'd love input on how to make the implementation less naive...I have
>> implemented the query_pattern method to use couchdb views instead of
>> iterating over the entire repo, but is there more to it?  I think the
>> looping behavior on the graph queries is a consequence of the graph query
>> implementation, not the backend, right?
>> 
>> On Wed, Mar 2, 2011 at 10:31 AM, Gabor Ratky <gabor@secretsaucepartners.com>
>> wrote:
>>> 
>>> Are you using Dan Thomas' rdf-couchdb project?
>>> (https://github.com/ipublic/rdf-couchdb) I've found the project a naive
>>> RDF::Repository implementation on top of CouchDB in many ways. Great proof
>>> of concept with rdf-spec tests passing, but definitely needs work,
>>> especially in the 'efficient querying' space, IMHO.
>>> Are you taking a hard dependency on CouchDB in other parts of your
>>> architecture (like us), or just chose it as an RDF repository?
>>> Gabor
>>> On Mar 2, 2011, at 3:20 PM, Greg Lappen wrote:
>>> 
>>> Hi all,
>>> We are making good progress with our project, and I've gotten to the point
>>> where I am storing datasets in our rdf repository (rdf.rb based, implemented
>>> on couchdb).  Now I'm building a page that allows the data to be exported in
>>> various formats (xml, csv, etc), but when I iterate over all of the data, it
>>> is extremely slow.  I see Spira querying the repository once for each
>>> instance when I iterate using the model's "each" method.  I understand why,
>>> I'm just wondering if there's a faster way to query all of the instances of
>>> a Spira class.  One thought we had was to use a graph query instead, which
>>> would pull out all the properties in N queries (where N is the number of
>>> properties in the class).  In the example I'm trying, this would be 23
>>> queries, which is better than hundreds or thousands of queries. Is this as
>>> good as it gets?  I'm accustomed to working with RDBMS and ActiveRecord, so
>>> I may just have to shift my expectations a bit, but thought I would ask the
>>> group if there's something I'm missing....thanks as always,
>>> Greg
>> 
>> 
>

Received on Wednesday, 2 March 2011 16:07:06 UTC