W3C home > Mailing lists > Public > public-rdf-ruby@w3.org > April 2014

Re: RDF::Graph performance

From: Joachim Baran <joachim.baran@gmail.com>
Date: Mon, 28 Apr 2014 10:24:43 -0700
Message-ID: <CAObSwHVyht3LSwf21VRAC83R0C34xt5szHDd+X8cHr72BZm6Qw@mail.gmail.com>
To: Arto Bendiken <arto@bendiken.net>
Cc: W3C Ruby RDF mailing list <public-rdf-ruby@w3.org>
Hello,

  Thank you for your suggestions!

  I am using RDF::Reader now, since it appeared the fastest option.

  It works well, but I had to substitute \' with just ' in my input. Not
sure what the RDF standard says about this -- I could not quickly find a
complete list of valid escape sequences.

Best wishes,

Kim


On 28 April 2014 09:26, Arto Bendiken <arto@bendiken.net> wrote:

> Hi Joachim,
>
> On Mon, Apr 28, 2014 at 6:12 PM, Joachim Baran <joachim.baran@gmail.com>
> wrote:
> > Hello,
> >
> >   I am trying to load an 800MB N-Quads file via:
> >
> >     graph = RDF::Graph.load('myfile.nq', :format => format)
> >
> >   That process has not finished yet and I am wondering if there are
> > performance optimization parameters that I can provide to speed up the
> > loading process.
>
> I'm afraid the RDF::Graph implementation wasn't really designed for
> inputs of that size.
>
> Depending on what access patterns you need on the resulting in-memory
> object, it'd likely be faster to convert the N-Quads file to a more
> compact format and then simply use a file-backed RDF::Reader on top of
> it.
>
> Both RDF::Reader and RDF::Graph mix in RDF::Enumerable, which is the
> basis for most higher-level retrieval operations in RDF.rb, so you
> should be able to perform quad-pattern matches and the like, with the
> proviso that each one does turn into a rewind + full scan of the file.
>
> With the operating system caching the file contents in memory for you,
> you'll be limited chiefly by the parser speed (hence the RDF::Raptor
> gem might be the place to start).
>
> Kind regards,
> Arto
>
> --
> Arto Bendiken | @bendiken | http://ar.to
>
Received on Monday, 28 April 2014 17:25:11 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 17:02:16 UTC