W3C home > Mailing lists > Public > public-rdf-ruby@w3.org > March 2011

Re: rdf.rb/spira bulk read question

From: Greg Lappen <greg@lapcominc.com>
Date: Wed, 2 Mar 2011 10:57:59 -0500
Message-ID: <AANLkTim18ShcNXxDRZMxMeGXAVigsfBwjjXp=JGQLkge@mail.gmail.com>
To: Ben Lavender <blavender@gmail.com>
Cc: Gabor Ratky <gabor@secretsaucepartners.com>, public-rdf-ruby@w3.org
Hmm, I will have to look at Query#execute more closely and think if there's
something that can be done in CouchDB to make graph queries more efficient.
 But I was surprised by the iteration, because in my mind, graph queries we
like a UNION - each pattern could be one query, and the union of resulting
statements would provide the solutions, resulting in much less queries...not
sure if that's realistic or not though without a closer look at
Query#execute.

RE: Sesame vs. CouchDB, I have been actively investigating other storage
options, but CouchDB is the only one where the replication is a user-level,
runtime function. MySQL and MongoDB support master-slave replication, but
it's a static configuration.  Tokyo Tyrant supports master-master
replication, but again, at configuration time.

If we did want a separate SPARQL server, 4store seems to be more scalable,
although it is self-advertised that way and I haven't verified it.

On Wed, Mar 2, 2011 at 10:49 AM, Ben Lavender <blavender@gmail.com> wrote:

> Currently, that is a correct understanding. I think we'd be willing to
> accept a patch that checks if the given queryable has its own
> implementation of Query#execute and uses that if found, and the
> default if not. That should maybe even be the default, making
> Query#execute call out to some method on Queryable that holds the
> current BGP logic, which implementations can overwrite.
>
> OTOH most implementations won't be able to do anything much more
> effectively than the default algorithm. It is what it is.
>
> If replication is your main goal, I'd suggest that several stores,
> i.e. Sesame, can quite effectively use MySQL as a backend and you
> could use that replication.
>
> Ben
>
> On Wed, Mar 2, 2011 at 9:37 AM, Greg Lappen <greg@lapcominc.com> wrote:
> > Yes, not only am I using ipublic/rdf-couchdb, I WROTE it!  I'm pleasantly
> > surprised to find that someone else has tried to use it, ha!
> > I'd love input on how to make the implementation less naive...I have
> > implemented the query_pattern method to use couchdb views instead of
> > iterating over the entire repo, but is there more to it?  I think the
> > looping behavior on the graph queries is a consequence of the graph query
> > implementation, not the backend, right?
> >
> > On Wed, Mar 2, 2011 at 10:31 AM, Gabor Ratky <
> gabor@secretsaucepartners.com>
> > wrote:
> >>
> >> Are you using Dan Thomas' rdf-couchdb project?
> >> (https://github.com/ipublic/rdf-couchdb) I've found the project a naive
> >> RDF::Repository implementation on top of CouchDB in many ways. Great
> proof
> >> of concept with rdf-spec tests passing, but definitely needs work,
> >> especially in the 'efficient querying' space, IMHO.
> >> Are you taking a hard dependency on CouchDB in other parts of your
> >> architecture (like us), or just chose it as an RDF repository?
> >> Gabor
> >> On Mar 2, 2011, at 3:20 PM, Greg Lappen wrote:
> >>
> >> Hi all,
> >> We are making good progress with our project, and I've gotten to the
> point
> >> where I am storing datasets in our rdf repository (rdf.rb based,
> implemented
> >> on couchdb).  Now I'm building a page that allows the data to be
> exported in
> >> various formats (xml, csv, etc), but when I iterate over all of the
> data, it
> >> is extremely slow.  I see Spira querying the repository once for each
> >> instance when I iterate using the model's "each" method.  I understand
> why,
> >> I'm just wondering if there's a faster way to query all of the instances
> of
> >> a Spira class.  One thought we had was to use a graph query instead,
> which
> >> would pull out all the properties in N queries (where N is the number of
> >> properties in the class).  In the example I'm trying, this would be 23
> >> queries, which is better than hundreds or thousands of queries. Is this
> as
> >> good as it gets?  I'm accustomed to working with RDBMS and ActiveRecord,
> so
> >> I may just have to shift my expectations a bit, but thought I would ask
> the
> >> group if there's something I'm missing....thanks as always,
> >> Greg
> >
> >
>
Received on Wednesday, 2 March 2011 15:58:52 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 2 March 2011 15:58:52 GMT