- From: Michel Dumontier <michel.dumontier@gmail.com>
- Date: Tue, 29 Jan 2013 18:01:45 -0500
- To: Nicholas J Humfrey <njh@aelius.com>
- Cc: semantic-web@w3.org
- Message-ID: <CALcEXf4CAMvuDmVJXfSQWFc6ihxG9tU5wii6WMFC8HuoXy_6GA@mail.gmail.com>
On Tue, Jan 29, 2013 at 5:43 PM, Nicholas J Humfrey <njh@aelius.com> wrote: > Hello Michel, > > Hopefully we can work something out. > > Which output formats do you require streaming for? > If you were able to stream n-triples/n-quads would that be enough? > > it is for me. > Would an API like this work for you? > $stream->writeTriple($subject, $predicate, $object, $graph); > > $stream->writeQuad($s,$p,$o,$g) perfect. m. > > nick. > > > > Nicholas, > > Bio2RDF scripts [1] process mixed media - flat files, tab files, xml > > files > > - into RDF [2], and this is coordinated with a common API [3]. Some of > > these files are rather large, so it's important that we occasionally > > serialize the contents otherwise we'll run out of space. Since in only > > very > > rare times do we actually want to query the model, we don't need it in > > memory, but just keep an simple index if necessary. We also need support > > for generating n-quads. Although I've been thinking of redeveloping my > > api > > with easyRDF (while also taking advantage of composer for dependency > > management and sami for documentation), streaming and nquad support are > > essential requirements. > > > > m. > > > > [1] https://github.com/bio2rdf/bio2rdf-scripts > > [2] https://github.com/bio2rdf/bio2rdf-scripts/wiki > > [3] https://github.com/micheldumontier/php-lib > > > > > > > > > > On Tue, Jan 29, 2013 at 3:42 PM, Nicholas J Humfrey <njh@aelius.com> > > wrote: > > > >> Hello Denny, > >> > >> Sorry, I am not on the semantic-web mailing list - too many emails for > >> me > >> to take in and not enough time. Stéphane kindly forwarded your email to > >> me. > >> > >> > >> No, it is not currently possible to serialise a triple stream with > >> EasyRdf. > >> I took this decision for a number of reasons: > >> > >> 1) EasyRdf was designed with the BBC's web platform in mind. This > >> typically uses Java (and others) as a 'heavy lifting' service layer and > >> PHP as a lightweight presentation layer. As such PHP should only have a > >> single page worth of data to process at a time - thus streaming was not > >> an > >> important requirement. > >> > >> 2) At the core of the EasyRdf is the a graph model object. EasyRdf > >> started > >> off as an object model layer on top of ARC2 (and others). Since ARC2 has > >> had less development work done on it, I have been expanding the number > >> of > >> native parsers and serialisers in it. I want to avoid making it overly > >> complex with multiple APIs for doing similar things (!) > >> > >> 3) The HTTP client API that I have been using (based on > >> Zend_HTTP_Client, > >> which is again what the BBC uses) doesn't support streaming - it loads > >> the > >> full response into memory. Therefore there are fewer benefits in EasyRdf > >> being able to stream triples. > >> > >> 4) I have worked hard to try and make the RDF/XML and Turtle > >> serialisations as pretty as possible - this involves collecting/sorting > >> all the same resources and properties together, so that the document > >> reads > >> well. Otherwise you just end up with a triple oriented document that > >> reads > >> like N-Triples or Trix. Some implementations (such as Redland) do this > >> within the serialiser itself but that seemed like an extra overhead, > >> when > >> I already had the data organised like that inside the EasyRdf graph > >> object. > >> > >> > >> Having said all of that, some of the serialisers would be fairly easy to > >> convert and I would be willing to look at changing the API in order to > >> help you with your requirements (I am a big fan of WikiData!). It would > >> also make sense to not have multiple PHP libraries for serialising RDF, > >> with varying quality and features - I think this is one of the reasons > >> why > >> the semantic web hasn't taken off faster. > >> > >> > >> What is your streaming source of triples? > >> Are you serialising direct from the database? > >> Can the database pre-sort subjects and properties, so they are ready to > >> be > >> serialised? > >> Is this for a bulk-export or individual API queries? > >> > >> > >> nick. > >> > >> > >> > ---------- Forwarded message ---------- > >> > From: Denny Vrandečić <denny.vrandecic@wikimedia.de> > >> > Date: Tue, Jan 29, 2013 at 11:54 AM > >> > Subject: Light-weight streaming PHP library for RDF serialization? > >> > To: SW-forum <semantic-web@w3.org> > >> > > >> > > >> > Hi, > >> > > >> > is there an actively maintained open source pure PHP library that can > >> be > >> > used to create RDF serialization from a model? > >> > > >> > It should be able to stream a big number of triples. > >> > > >> > Pluspoints if there it has no Parser or SPARQL processing library as a > >> > dependency, in order to decrease the size of the library (smaller > >> library > >> > = > >> > happier code reviewer, less maintenance costs). > >> > > >> > Cheers, > >> > Denny > >> > > >> > -- > >> > Project director Wikidata > >> > Wikimedia Deutschland e.V. | Obentrautstr. 72 | 10963 Berlin > >> > Tel. +49-30-219 158 26-0 | http://wikimedia.de > >> > > >> > Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens > >> e.V. > >> > Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg > >> > unter > >> > der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt > >> für > >> > Körperschaften I Berlin, Steuernummer 27/681/51985. > >> > > >> > > >> > > >> > -- > >> > Steph. > >> > > >> > >> > >> > >> > > > > > > -- > > Michel Dumontier > > Associate Professor of Bioinformatics, Carleton University > > Chair, W3C Semantic Web for Health Care and the Life Sciences Interest > > Group > > http://dumontierlab.com > > > > > -- Michel Dumontier Associate Professor of Bioinformatics, Carleton University Chair, W3C Semantic Web for Health Care and the Life Sciences Interest Group http://dumontierlab.com
Received on Tuesday, 29 January 2013 23:02:35 UTC