W3C home > Mailing lists > Public > semantic-web@w3.org > January 2013

Re: Fwd: Light-weight streaming PHP library for RDF serialization?

From: Michel Dumontier <michel.dumontier@gmail.com>
Date: Tue, 29 Jan 2013 18:01:45 -0500
Message-ID: <CALcEXf4CAMvuDmVJXfSQWFc6ihxG9tU5wii6WMFC8HuoXy_6GA@mail.gmail.com>
To: Nicholas J Humfrey <njh@aelius.com>
Cc: semantic-web@w3.org
On Tue, Jan 29, 2013 at 5:43 PM, Nicholas J Humfrey <njh@aelius.com> wrote:

> Hello Michel,
>
> Hopefully we can work something out.
>
> Which output formats do you require streaming for?
> If you were able to stream n-triples/n-quads would that be enough?
>
> it is for me.


> Would an API like this work for you?
> $stream->writeTriple($subject, $predicate, $object, $graph);
>
>
$stream->writeQuad($s,$p,$o,$g)

perfect.

m.



>
> nick.
>
>
> > Nicholas,
> >  Bio2RDF scripts [1] process mixed media - flat files, tab files, xml
> > files
> > - into RDF [2], and this is coordinated with a common API [3]. Some of
> > these files are rather large, so it's important that we occasionally
> > serialize the contents otherwise we'll run out of space. Since in only
> > very
> > rare times do we actually want to query the model, we don't need it in
> > memory, but just keep an simple index if necessary. We also need support
> > for generating n-quads.  Although I've been thinking of redeveloping my
> > api
> > with easyRDF (while also taking advantage of composer for dependency
> > management and sami for documentation), streaming and nquad support are
> > essential requirements.
> >
> > m.
> >
> > [1] https://github.com/bio2rdf/bio2rdf-scripts
> > [2] https://github.com/bio2rdf/bio2rdf-scripts/wiki
> > [3] https://github.com/micheldumontier/php-lib
> >
> >
> >
> >
> > On Tue, Jan 29, 2013 at 3:42 PM, Nicholas J Humfrey <njh@aelius.com>
> > wrote:
> >
> >> Hello Denny,
> >>
> >> Sorry, I am not on the semantic-web mailing list - too many emails for
> >> me
> >> to take in and not enough time. Stéphane kindly forwarded your email to
> >> me.
> >>
> >>
> >> No, it is not currently possible to serialise a triple stream with
> >> EasyRdf.
> >> I took this decision for a number of reasons:
> >>
> >> 1) EasyRdf was designed with the BBC's web platform in mind. This
> >> typically uses Java (and others) as a 'heavy lifting' service layer and
> >> PHP as a lightweight presentation layer. As such PHP should only have a
> >> single page worth of data to process at a time - thus streaming was not
> >> an
> >> important requirement.
> >>
> >> 2) At the core of the EasyRdf is the a graph model object. EasyRdf
> >> started
> >> off as an object model layer on top of ARC2 (and others). Since ARC2 has
> >> had less development work done on it, I have been expanding the number
> >> of
> >> native parsers and serialisers in it. I want to avoid making it overly
> >> complex with multiple APIs for doing similar things (!)
> >>
> >> 3) The HTTP client API that I have been using (based on
> >> Zend_HTTP_Client,
> >> which is again what the BBC uses) doesn't support streaming - it loads
> >> the
> >> full response into memory. Therefore there are fewer benefits in EasyRdf
> >> being able to stream triples.
> >>
> >> 4) I have worked hard to try and make the RDF/XML and Turtle
> >> serialisations as pretty as possible - this involves collecting/sorting
> >> all the same resources and properties together, so that the document
> >> reads
> >> well. Otherwise you just end up with a triple oriented document that
> >> reads
> >> like N-Triples or Trix. Some implementations (such as Redland) do this
> >> within the serialiser itself but that seemed like an extra overhead,
> >> when
> >> I already had the data organised like that inside the EasyRdf graph
> >> object.
> >>
> >>
> >> Having said all of that, some of the serialisers would be fairly easy to
> >> convert and I would be willing to look at changing the API in order to
> >> help you with your requirements (I am a big fan of WikiData!). It would
> >> also make sense to not have multiple PHP libraries for serialising RDF,
> >> with varying quality and features - I think this is one of the reasons
> >> why
> >> the semantic web hasn't taken off faster.
> >>
> >>
> >> What is your streaming source of triples?
> >> Are you serialising direct from the database?
> >> Can the database pre-sort subjects and properties, so they are ready to
> >> be
> >> serialised?
> >> Is this for a bulk-export or individual API queries?
> >>
> >>
> >> nick.
> >>
> >>
> >> > ---------- Forwarded message ----------
> >> > From: Denny Vrandečić <denny.vrandecic@wikimedia.de>
> >> > Date: Tue, Jan 29, 2013 at 11:54 AM
> >> > Subject: Light-weight streaming PHP library for RDF serialization?
> >> > To: SW-forum <semantic-web@w3.org>
> >> >
> >> >
> >> > Hi,
> >> >
> >> > is there an actively maintained open source pure PHP library that can
> >> be
> >> > used to create RDF serialization from a model?
> >> >
> >> > It should be able to stream a big number of triples.
> >> >
> >> > Pluspoints if there it has no Parser or SPARQL processing library as a
> >> > dependency, in order to decrease the size of the library (smaller
> >> library
> >> > =
> >> > happier code reviewer, less maintenance costs).
> >> >
> >> > Cheers,
> >> > Denny
> >> >
> >> > --
> >> > Project director Wikidata
> >> > Wikimedia Deutschland e.V. | Obentrautstr. 72 | 10963 Berlin
> >> > Tel. +49-30-219 158 26-0 | http://wikimedia.de
> >> >
> >> > Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens
> >> e.V.
> >> > Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
> >> > unter
> >> > der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt
> >> für
> >> > Körperschaften I Berlin, Steuernummer 27/681/51985.
> >> >
> >> >
> >> >
> >> > --
> >> > Steph.
> >> >
> >>
> >>
> >>
> >>
> >
> >
> > --
> > Michel Dumontier
> > Associate Professor of Bioinformatics, Carleton University
> > Chair, W3C Semantic Web for Health Care and the Life Sciences Interest
> > Group
> > http://dumontierlab.com
> >
>
>
>


-- 
Michel Dumontier
Associate Professor of Bioinformatics, Carleton University
Chair, W3C Semantic Web for Health Care and the Life Sciences Interest Group
http://dumontierlab.com
Received on Tuesday, 29 January 2013 23:02:35 GMT

This archive was generated by hypermail 2.3.1 : Tuesday, 26 March 2013 21:45:53 GMT