Re: Fwd: Light-weight streaming PHP library for RDF serialization? from Nicholas J Humfrey on 2013-01-29 (semantic-web@w3.org from January 2013)

From: Nicholas J Humfrey <njh@aelius.com>
Date: Tue, 29 Jan 2013 20:42:38 -0000
To: denny.vrandecic@wikimedia.de
Cc: scorlosquet@gmail.com, semantic-web@w3.org
Message-ID: <2061a281bb0ed362e43a6972a676176e.squirrel@www.aelius.com>

Hello Denny,

Sorry, I am not on the semantic-web mailing list - too many emails for me
to take in and not enough time. Stéphane kindly forwarded your email to
me.


No, it is not currently possible to serialise a triple stream with EasyRdf.
I took this decision for a number of reasons:

1) EasyRdf was designed with the BBC's web platform in mind. This
typically uses Java (and others) as a 'heavy lifting' service layer and
PHP as a lightweight presentation layer. As such PHP should only have a
single page worth of data to process at a time - thus streaming was not an
important requirement.

2) At the core of the EasyRdf is the a graph model object. EasyRdf started
off as an object model layer on top of ARC2 (and others). Since ARC2 has
had less development work done on it, I have been expanding the number of
native parsers and serialisers in it. I want to avoid making it overly
complex with multiple APIs for doing similar things (!)

3) The HTTP client API that I have been using (based on Zend_HTTP_Client,
which is again what the BBC uses) doesn't support streaming - it loads the
full response into memory. Therefore there are fewer benefits in EasyRdf
being able to stream triples.

4) I have worked hard to try and make the RDF/XML and Turtle
serialisations as pretty as possible - this involves collecting/sorting
all the same resources and properties together, so that the document reads
well. Otherwise you just end up with a triple oriented document that reads
like N-Triples or Trix. Some implementations (such as Redland) do this
within the serialiser itself but that seemed like an extra overhead, when
I already had the data organised like that inside the EasyRdf graph
object.


Having said all of that, some of the serialisers would be fairly easy to
convert and I would be willing to look at changing the API in order to
help you with your requirements (I am a big fan of WikiData!). It would
also make sense to not have multiple PHP libraries for serialising RDF,
with varying quality and features - I think this is one of the reasons why
the semantic web hasn't taken off faster.


What is your streaming source of triples?
Are you serialising direct from the database?
Can the database pre-sort subjects and properties, so they are ready to be
serialised?
Is this for a bulk-export or individual API queries?


nick.


> ---------- Forwarded message ----------
> From: Denny Vrandečić <denny.vrandecic@wikimedia.de>
> Date: Tue, Jan 29, 2013 at 11:54 AM
> Subject: Light-weight streaming PHP library for RDF serialization?
> To: SW-forum <semantic-web@w3.org>
>
>
> Hi,
>
> is there an actively maintained open source pure PHP library that can be
> used to create RDF serialization from a model?
>
> It should be able to stream a big number of triples.
>
> Pluspoints if there it has no Parser or SPARQL processing library as a
> dependency, in order to decrease the size of the library (smaller library
> =
> happier code reviewer, less maintenance costs).
>
> Cheers,
> Denny
>
> --
> Project director Wikidata
> Wikimedia Deutschland e.V. | Obentrautstr. 72 | 10963 Berlin
> Tel. +49-30-219 158 26-0 | http://wikimedia.de
>
> Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V.
> Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
> unter
> der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für
> Körperschaften I Berlin, Steuernummer 27/681/51985.
>
>
>
> --
> Steph.
>

Received on Tuesday, 29 January 2013 20:43:01 UTC