- From: Greg Lappen <greg@lapcominc.com>
- Date: Wed, 2 Mar 2011 11:15:06 -0500
- To: Ben Lavender <blavender@gmail.com>
- Cc: Gabor Ratky <gabor@secretsaucepartners.com>, public-rdf-ruby@w3.org
- Message-ID: <AANLkTimUkwOOeS8iC56OQrU1fMFvACKKW9KsM8CfXe7D@mail.gmail.com>
Thanks for all the info. It doesn't sound like there's an easy way out of this unfortunately... On Wed, Mar 2, 2011 at 11:08 AM, Ben Lavender <blavender@gmail.com> wrote: > On Wed, Mar 2, 2011 at 9:57 AM, Greg Lappen <greg@lapcominc.com> wrote: > > Hmm, I will have to look at Query#execute more closely and think if > there's > > something that can be done in CouchDB to make graph queries more > efficient. > > But I was surprised by the iteration, because in my mind, graph queries > we > > like a UNION - each pattern could be one query, and the union of > resulting > > statements would provide the solutions, resulting in much less > queries...not > > sure if that's realistic or not though without a closer look at > > Query#execute. > > The algorithm is more subtle than that. The patterns return bindings, > not statements, and you can't simply union each pattern, as you need > to constrain later patterns or you'll end up doing intermediate > patterns that return the entire repository. You could improve the > performance over RDF.rb's by making each pattern take a list of > existing bindings and checking them in couch, instead of having Ruby > iterate over them and re-running the query, but at a Big O level, > someone, somewhere, has to check all of the existing bindings against > later patterns. > > > RE: Sesame vs. CouchDB, I have been actively investigating other storage > > options, but CouchDB is the only one where the replication is a > user-level, > > runtime function. MySQL and MongoDB support master-slave replication, but > > it's a static configuration. Tokyo Tyrant supports master-master > > replication, but again, at configuration time. > > If we did want a separate SPARQL server, 4store seems to be more > scalable, > > although it is self-advertised that way and I haven't verified it. > > > > On Wed, Mar 2, 2011 at 10:49 AM, Ben Lavender <blavender@gmail.com> > wrote: > >> > >> Currently, that is a correct understanding. I think we'd be willing to > >> accept a patch that checks if the given queryable has its own > >> implementation of Query#execute and uses that if found, and the > >> default if not. That should maybe even be the default, making > >> Query#execute call out to some method on Queryable that holds the > >> current BGP logic, which implementations can overwrite. > >> > >> OTOH most implementations won't be able to do anything much more > >> effectively than the default algorithm. It is what it is. > >> > >> If replication is your main goal, I'd suggest that several stores, > >> i.e. Sesame, can quite effectively use MySQL as a backend and you > >> could use that replication. > >> > >> Ben > >> > >> On Wed, Mar 2, 2011 at 9:37 AM, Greg Lappen <greg@lapcominc.com> wrote: > >> > Yes, not only am I using ipublic/rdf-couchdb, I WROTE it! I'm > >> > pleasantly > >> > surprised to find that someone else has tried to use it, ha! > >> > I'd love input on how to make the implementation less naive...I have > >> > implemented the query_pattern method to use couchdb views instead of > >> > iterating over the entire repo, but is there more to it? I think the > >> > looping behavior on the graph queries is a consequence of the graph > >> > query > >> > implementation, not the backend, right? > >> > > >> > On Wed, Mar 2, 2011 at 10:31 AM, Gabor Ratky > >> > <gabor@secretsaucepartners.com> > >> > wrote: > >> >> > >> >> Are you using Dan Thomas' rdf-couchdb project? > >> >> (https://github.com/ipublic/rdf-couchdb) I've found the project a > naive > >> >> RDF::Repository implementation on top of CouchDB in many ways. Great > >> >> proof > >> >> of concept with rdf-spec tests passing, but definitely needs work, > >> >> especially in the 'efficient querying' space, IMHO. > >> >> Are you taking a hard dependency on CouchDB in other parts of your > >> >> architecture (like us), or just chose it as an RDF repository? > >> >> Gabor > >> >> On Mar 2, 2011, at 3:20 PM, Greg Lappen wrote: > >> >> > >> >> Hi all, > >> >> We are making good progress with our project, and I've gotten to the > >> >> point > >> >> where I am storing datasets in our rdf repository (rdf.rb based, > >> >> implemented > >> >> on couchdb). Now I'm building a page that allows the data to be > >> >> exported in > >> >> various formats (xml, csv, etc), but when I iterate over all of the > >> >> data, it > >> >> is extremely slow. I see Spira querying the repository once for each > >> >> instance when I iterate using the model's "each" method. I > understand > >> >> why, > >> >> I'm just wondering if there's a faster way to query all of the > >> >> instances of > >> >> a Spira class. One thought we had was to use a graph query instead, > >> >> which > >> >> would pull out all the properties in N queries (where N is the number > >> >> of > >> >> properties in the class). In the example I'm trying, this would be > 23 > >> >> queries, which is better than hundreds or thousands of queries. Is > this > >> >> as > >> >> good as it gets? I'm accustomed to working with RDBMS and > >> >> ActiveRecord, so > >> >> I may just have to shift my expectations a bit, but thought I would > ask > >> >> the > >> >> group if there's something I'm missing....thanks as always, > >> >> Greg > >> > > >> > > > > > >
Received on Wednesday, 2 March 2011 16:16:00 UTC