Re: RDF::Graph performance from Arto Bendiken on 2014-04-28 (public-rdf-ruby@w3.org from April 2014)

From: Arto Bendiken <arto@bendiken.net>
Date: Mon, 28 Apr 2014 18:26:12 +0200
To: Joachim Baran <joachim.baran@gmail.com>
Cc: W3C Ruby RDF mailing list <public-rdf-ruby@w3.org>
Message-ID: <CAE7aNuRne4FeOToFWDuU=7jyLfc_m9LFDFEg6dcnGOovejBKKA@mail.gmail.com>

Hi Joachim,

On Mon, Apr 28, 2014 at 6:12 PM, Joachim Baran <joachim.baran@gmail.com> wrote:
> Hello,
>
>   I am trying to load an 800MB N-Quads file via:
>
>     graph = RDF::Graph.load('myfile.nq', :format => format)
>
>   That process has not finished yet and I am wondering if there are
> performance optimization parameters that I can provide to speed up the
> loading process.

I'm afraid the RDF::Graph implementation wasn't really designed for
inputs of that size.

Depending on what access patterns you need on the resulting in-memory
object, it'd likely be faster to convert the N-Quads file to a more
compact format and then simply use a file-backed RDF::Reader on top of
it.

Both RDF::Reader and RDF::Graph mix in RDF::Enumerable, which is the
basis for most higher-level retrieval operations in RDF.rb, so you
should be able to perform quad-pattern matches and the like, with the
proviso that each one does turn into a rewind + full scan of the file.

With the operating system caching the file contents in memory for you,
you'll be limited chiefly by the parser speed (hence the RDF::Raptor
gem might be the place to start).

Kind regards,
Arto

-- 
Arto Bendiken | @bendiken | http://ar.to

Received on Monday, 28 April 2014 16:27:26 UTC