- From: Nicholas J Humfrey <njh@aelius.com>
- Date: Tue, 29 Jan 2013 22:43:15 -0000
- To: "Michel Dumontier" <michel.dumontier@gmail.com>
- Cc: semantic-web@w3.org
Hello Michel, Hopefully we can work something out. Which output formats do you require streaming for? If you were able to stream n-triples/n-quads would that be enough? Would an API like this work for you? $stream->writeTriple($subject, $predicate, $object, $graph); nick. > Nicholas, > Bio2RDF scripts [1] process mixed media - flat files, tab files, xml > files > - into RDF [2], and this is coordinated with a common API [3]. Some of > these files are rather large, so it's important that we occasionally > serialize the contents otherwise we'll run out of space. Since in only > very > rare times do we actually want to query the model, we don't need it in > memory, but just keep an simple index if necessary. We also need support > for generating n-quads. Although I've been thinking of redeveloping my > api > with easyRDF (while also taking advantage of composer for dependency > management and sami for documentation), streaming and nquad support are > essential requirements. > > m. > > [1] https://github.com/bio2rdf/bio2rdf-scripts > [2] https://github.com/bio2rdf/bio2rdf-scripts/wiki > [3] https://github.com/micheldumontier/php-lib > > > > > On Tue, Jan 29, 2013 at 3:42 PM, Nicholas J Humfrey <njh@aelius.com> > wrote: > >> Hello Denny, >> >> Sorry, I am not on the semantic-web mailing list - too many emails for >> me >> to take in and not enough time. Stéphane kindly forwarded your email to >> me. >> >> >> No, it is not currently possible to serialise a triple stream with >> EasyRdf. >> I took this decision for a number of reasons: >> >> 1) EasyRdf was designed with the BBC's web platform in mind. This >> typically uses Java (and others) as a 'heavy lifting' service layer and >> PHP as a lightweight presentation layer. As such PHP should only have a >> single page worth of data to process at a time - thus streaming was not >> an >> important requirement. >> >> 2) At the core of the EasyRdf is the a graph model object. EasyRdf >> started >> off as an object model layer on top of ARC2 (and others). Since ARC2 has >> had less development work done on it, I have been expanding the number >> of >> native parsers and serialisers in it. I want to avoid making it overly >> complex with multiple APIs for doing similar things (!) >> >> 3) The HTTP client API that I have been using (based on >> Zend_HTTP_Client, >> which is again what the BBC uses) doesn't support streaming - it loads >> the >> full response into memory. Therefore there are fewer benefits in EasyRdf >> being able to stream triples. >> >> 4) I have worked hard to try and make the RDF/XML and Turtle >> serialisations as pretty as possible - this involves collecting/sorting >> all the same resources and properties together, so that the document >> reads >> well. Otherwise you just end up with a triple oriented document that >> reads >> like N-Triples or Trix. Some implementations (such as Redland) do this >> within the serialiser itself but that seemed like an extra overhead, >> when >> I already had the data organised like that inside the EasyRdf graph >> object. >> >> >> Having said all of that, some of the serialisers would be fairly easy to >> convert and I would be willing to look at changing the API in order to >> help you with your requirements (I am a big fan of WikiData!). It would >> also make sense to not have multiple PHP libraries for serialising RDF, >> with varying quality and features - I think this is one of the reasons >> why >> the semantic web hasn't taken off faster. >> >> >> What is your streaming source of triples? >> Are you serialising direct from the database? >> Can the database pre-sort subjects and properties, so they are ready to >> be >> serialised? >> Is this for a bulk-export or individual API queries? >> >> >> nick. >> >> >> > ---------- Forwarded message ---------- >> > From: Denny Vrandečić <denny.vrandecic@wikimedia.de> >> > Date: Tue, Jan 29, 2013 at 11:54 AM >> > Subject: Light-weight streaming PHP library for RDF serialization? >> > To: SW-forum <semantic-web@w3.org> >> > >> > >> > Hi, >> > >> > is there an actively maintained open source pure PHP library that can >> be >> > used to create RDF serialization from a model? >> > >> > It should be able to stream a big number of triples. >> > >> > Pluspoints if there it has no Parser or SPARQL processing library as a >> > dependency, in order to decrease the size of the library (smaller >> library >> > = >> > happier code reviewer, less maintenance costs). >> > >> > Cheers, >> > Denny >> > >> > -- >> > Project director Wikidata >> > Wikimedia Deutschland e.V. | Obentrautstr. 72 | 10963 Berlin >> > Tel. +49-30-219 158 26-0 | http://wikimedia.de >> > >> > Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens >> e.V. >> > Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg >> > unter >> > der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt >> für >> > Körperschaften I Berlin, Steuernummer 27/681/51985. >> > >> > >> > >> > -- >> > Steph. >> > >> >> >> >> > > > -- > Michel Dumontier > Associate Professor of Bioinformatics, Carleton University > Chair, W3C Semantic Web for Health Care and the Life Sciences Interest > Group > http://dumontierlab.com >
Received on Tuesday, 29 January 2013 22:43:38 UTC