Re: rdf.rb/spira bulk read question from Greg Lappen on 2011-03-02 (public-rdf-ruby@w3.org from March 2011)

From: Greg Lappen <greg@lapcominc.com>
Date: Wed, 2 Mar 2011 11:15:06 -0500
To: Ben Lavender <blavender@gmail.com>
Cc: Gabor Ratky <gabor@secretsaucepartners.com>, public-rdf-ruby@w3.org
Message-ID: <AANLkTimUkwOOeS8iC56OQrU1fMFvACKKW9KsM8CfXe7D@mail.gmail.com>
Thanks for all the info.  It doesn't sound like there's an easy way out of
this unfortunately...

On Wed, Mar 2, 2011 at 11:08 AM, Ben Lavender <blavender@gmail.com> wrote:

> On Wed, Mar 2, 2011 at 9:57 AM, Greg Lappen <greg@lapcominc.com> wrote:
> > Hmm, I will have to look at Query#execute more closely and think if
> there's
> > something that can be done in CouchDB to make graph queries more
> efficient.
> >  But I was surprised by the iteration, because in my mind, graph queries
> we
> > like a UNION - each pattern could be one query, and the union of
> resulting
> > statements would provide the solutions, resulting in much less
> queries...not
> > sure if that's realistic or not though without a closer look at
> > Query#execute.
>
> The algorithm is more subtle than that. The patterns return bindings,
> not statements, and you can't simply union each pattern, as you need
> to constrain later patterns or you'll end up doing intermediate
> patterns that return the entire repository. You could improve the
> performance over RDF.rb's by making each pattern take a list of
> existing bindings and checking them in couch, instead of having Ruby
> iterate over them and re-running the query, but at a Big O level,
> someone, somewhere, has to check all of the existing bindings against
> later patterns.
>
> > RE: Sesame vs. CouchDB, I have been actively investigating other storage
> > options, but CouchDB is the only one where the replication is a
> user-level,
> > runtime function. MySQL and MongoDB support master-slave replication, but
> > it's a static configuration.  Tokyo Tyrant supports master-master
> > replication, but again, at configuration time.
> > If we did want a separate SPARQL server, 4store seems to be more
> scalable,
> > although it is self-advertised that way and I haven't verified it.
> >
> > On Wed, Mar 2, 2011 at 10:49 AM, Ben Lavender <blavender@gmail.com>
> wrote:
> >>
> >> Currently, that is a correct understanding. I think we'd be willing to
> >> accept a patch that checks if the given queryable has its own
> >> implementation of Query#execute and uses that if found, and the
> >> default if not. That should maybe even be the default, making
> >> Query#execute call out to some method on Queryable that holds the
> >> current BGP logic, which implementations can overwrite.
> >>
> >> OTOH most implementations won't be able to do anything much more
> >> effectively than the default algorithm. It is what it is.
> >>
> >> If replication is your main goal, I'd suggest that several stores,
> >> i.e. Sesame, can quite effectively use MySQL as a backend and you
> >> could use that replication.
> >>
> >> Ben
> >>
> >> On Wed, Mar 2, 2011 at 9:37 AM, Greg Lappen <greg@lapcominc.com> wrote:
> >> > Yes, not only am I using ipublic/rdf-couchdb, I WROTE it!  I'm
> >> > pleasantly
> >> > surprised to find that someone else has tried to use it, ha!
> >> > I'd love input on how to make the implementation less naive...I have
> >> > implemented the query_pattern method to use couchdb views instead of
> >> > iterating over the entire repo, but is there more to it?  I think the
> >> > looping behavior on the graph queries is a consequence of the graph
> >> > query
> >> > implementation, not the backend, right?
> >> >
> >> > On Wed, Mar 2, 2011 at 10:31 AM, Gabor Ratky
> >> > <gabor@secretsaucepartners.com>
> >> > wrote:
> >> >>
> >> >> Are you using Dan Thomas' rdf-couchdb project?
> >> >> (https://github.com/ipublic/rdf-couchdb) I've found the project a
> naive
> >> >> RDF::Repository implementation on top of CouchDB in many ways. Great
> >> >> proof
> >> >> of concept with rdf-spec tests passing, but definitely needs work,
> >> >> especially in the 'efficient querying' space, IMHO.
> >> >> Are you taking a hard dependency on CouchDB in other parts of your
> >> >> architecture (like us), or just chose it as an RDF repository?
> >> >> Gabor
> >> >> On Mar 2, 2011, at 3:20 PM, Greg Lappen wrote:
> >> >>
> >> >> Hi all,
> >> >> We are making good progress with our project, and I've gotten to the
> >> >> point
> >> >> where I am storing datasets in our rdf repository (rdf.rb based,
> >> >> implemented
> >> >> on couchdb).  Now I'm building a page that allows the data to be
> >> >> exported in
> >> >> various formats (xml, csv, etc), but when I iterate over all of the
> >> >> data, it
> >> >> is extremely slow.  I see Spira querying the repository once for each
> >> >> instance when I iterate using the model's "each" method.  I
> understand
> >> >> why,
> >> >> I'm just wondering if there's a faster way to query all of the
> >> >> instances of
> >> >> a Spira class.  One thought we had was to use a graph query instead,
> >> >> which
> >> >> would pull out all the properties in N queries (where N is the number
> >> >> of
> >> >> properties in the class).  In the example I'm trying, this would be
> 23
> >> >> queries, which is better than hundreds or thousands of queries. Is
> this
> >> >> as
> >> >> good as it gets?  I'm accustomed to working with RDBMS and
> >> >> ActiveRecord, so
> >> >> I may just have to shift my expectations a bit, but thought I would
> ask
> >> >> the
> >> >> group if there's something I'm missing....thanks as always,
> >> >> Greg
> >> >
> >> >
> >
> >
>
Received on Wednesday, 2 March 2011 16:16:00 UTC