- From: Nicholas J Humfrey <njh@aelius.com>
- Date: Fri, 12 Oct 2012 12:43:56 +0100
- To: "Keith Alexander" <k.j.w.alexander@gmail.com>
- Cc: easyrdf@googlegroups.com, public-lod@w3.org, semantic-web@w3.org
Hello, No, it isn't possible to stream large documents. The core aim for EasyRdf is to help people take content modelled using RDF and produce HTML views containing that data. Part of the way it does that is to build up a graph in memory. If you want to process lots of data, I would recommend doing that in an external triplestore and then have EasyRdf consume smaller datasets as the results of a SPARQL query. Having said that, all the parsers internally call a addTriple() function - so it shouldn't be too difficult to make them stream. But I would personally look at using a different programming language, if the goal is processing large documents fast (C / Java). I have a spreadsheet of parser performance results here: https://docs.google.com/spreadsheet/ccc?key=0AnPv4eYSOwL7dGc0UjFhLUdWOG44Z3hsTXJ0cFRrc2c The built-in RDF/XML is actually just the ARC2 parser. Other parsers are a bit faster than ARC2s - but it isn't that fair a comparison because all the tests are using the EasyRdf API. It would be difficult to compare like with like. The fastest parser is the JSON one. For larger documents in other formats, the fastest method is to use rapper. nick. > Hi Nicholas, > > Nice work. > With regards to your parsers, is it possible to stream parse large > documents? (eg, by passing a callback function to the parser that will > be called with a batch of triples, as they are parsed) > > How do the parsers compare with ARC2's parsers? > > Cheers > > Keith > > On Fri, Oct 12, 2012 at 11:09 AM, Nicholas J Humfrey <njh@aelius.com> > wrote: >> Hello, >> >> Today I released version 0.7.0 of EasyRdf - a PHP library designed to >> make >> it easy to consume and produce RDF. It is licensed under a BSD-3 clause >> license. It has been tested against PHP 5.2, 5.3 and 5.4. >> >> Homepage: http://www.aelius.com/njh/easyrdf/ >> Download: http://github.com/downloads/njh/easyrdf/easyrdf-0.7.0.tar.gz >> API Docs: http://www.aelius.com/njh/easyrdf/docs/ >> >> >> >> Major new features >> ------------------ >> * Added a new pure-PHP Turtle parser >> * Added basic property-path support for traversing graphs >> * Added support for serialising to the GraphViz dot format (and >> generating >> images) >> * Added a new class EasyRdf_ParsedUri - a RFC3986 compliant URI parser >> >> Enhancements >> ------------ >> * The load() function in EasyRdf_Graph no-longer takes a $data argument >> * The parse() and load() methods, now return the number of triples >> parsed >> * Added count() method to EasyRdf_Resource and EasyRdf_Graph >> * Added localName() method to EasyRdf_Resource >> * Added htmlLink() method to EasyRdf_Resource >> * Added methods deleteResource() and deleteLiteral() to EasyRdf_Graph >> * Added support for guessing the file format based on the file extension >> * Performance improvements to built-in serialisers >> >> Environment changes >> ------------------- >> * Added PHP Composer description to the project >> * Now properly PSR-0 autoloader compatible >> * New minimum version of PHP is 5.2.8 >> * Changed test suite to require PHPUnit 3.6 >> * Changed from Phing to GNU Make based build system >> * Added automated testing of the examples >> >> The full ChangeLog is available here: >> https://github.com/njh/easyrdf/blob/0.7.0/CHANGELOG.md >> >> >> >> On the backlog of features for future releases: >> - Change code style from Zend to PSR-2 >> - Built-in caching support >> - Add an RDFa parser >> - Support for SPARQL 1.0 Update >> - SPARQL Query construction API >> - Replace the built-in RDF/XML parser >> >> >> >> Please let me know if you have any problems or questions. >> >> >> nick. >>
Received on Friday, 12 October 2012 11:44:25 UTC