Re: RDF::Graph performance

On Apr 28, 2014, at 9:12 AM, Joachim Baran <joachim.baran@gmail.com> wrote:

> Hello,
> 
>   I am trying to load an 800MB N-Quads file via:
> 
>     graph = RDF::Graph.load('myfile.nq', :format => format)
> 
>   That process has not finished yet and I am wondering if there are performance optimization parameters that I can provide to speed up the loading process.

That's a very large file to load into a memory store. RDF::Graph is structured as a recursive hash, and inserting triples can take more and more time as you add triples. In any case, you probably want to use RDF::Repository, if you're loading quads.

I'd suggest importing into a repository based on an external store, such as RDF::Mongo, RDF::Sesame, or RDF::Virtuoso, although the specifics may vary depending on the particular external repo.

The parser performance is about 10K Quads/second, which would take just a couple of minutes just to extract quads from the input, any other time is going to the insert time into the hash structure.

Gregg

> Thank you,
> 
> Kim

Received on Monday, 28 April 2014 16:26:50 UTC