- From: Arto Bendiken <arto@datagraph.org>
- Date: Sat, 30 Oct 2010 15:58:00 +0200
- To: Riccardo Giomi <giomi@netseven.it>
- Cc: public-rdf-ruby@w3.org
Hi Riccardo, On Sat, Oct 30, 2010 at 1:23 PM, Riccardo Giomi <giomi@netseven.it> wrote: > Hi, > I have been toying with the idea of transitioning from activeRDF to > RDF.rb (using Sesame) in a project I'm working on with my company. I'm > positively impressed by both code and community, so far. I'd have one > question though: > > in rdf.rubyforge.org/sesame, under limitations, it says: "not yet > optimized for RDF.rb 0.2.x's bulk-operation APIs". I could not find > anything about such an API in the RDF.rb code, though. What does "bulk > operations" mean? I was looking for optimized operations, mostly to > write and delete big graphs, and considering how slow Sesame usually > is. It just means that the current RDF::Sesame::Repository [1] implementation will insert statements into Sesame one at a time. This is suboptimal for loading a large dataset into Sesame using RDF.rb, hence the warning in the README. The RDF::Sesame implementation as it is today works fine for querying Sesame once you have already imported your dataset into Sesame by other means. But loading, say, 100K triples into Sesame using RDF::Sesame::Repository.load(file) would currently actually make 100K requests to Sesame - not something you want to do unless you have a coffee break coming up. Now, this could be significantly improved by having RDF::Sesame implement RDF.rb's bulk-operations API, which means having the RDF::Sesame::Repository class override and implement the RDF::Repository#insert_statements method instead of just #insert_statement (notice the plural in the former). The implementation of #insert_statements [2] should accept an arbitrary-length RDF::Enumerable as its argument, and then iterate through the given statements, buffering up some reasonable amount of statements before issuing a new Sesame request; for instance, it could insert 5,000 statements at a time, which would mean that inserting 100K statements would take only 20 requests to Sesame instead of the 100K requests currently required. We're not ourselves actively using or developing RDF::Sesame much at present, as we ended up developing our own custom RDF storage solution instead. But if you'd like to improve RDF::Sesame on this front, contributed features and bug fixes are certainly very welcome - particularly so in the form of easy-to-merge GitHub pull requests. Best regards, Arto [1] http://rdf.rubyforge.org/sesame/RDF/Sesame/Repository.html [2] http://rdf.rubyforge.org/RDF/Writable.html#insert_statements-instance_method PS. For RDF.rb-related work published as open source, I'd also be happy to provide you with a discounted consulting rate in case you need any RDF::Sesame particulars improved. -- Arto Bendiken | @bendiken @datagraph
Received on Saturday, 30 October 2010 15:51:13 UTC