- From: William Waites <ww@styx.org>
- Date: Mon, 4 Apr 2011 15:11:43 +0200
- To: Peter Frederick Patel-Schneider <pfps@research.bell-labs.com>
- Cc: steve.harris@garlik.com, eric@w3.org, andy.seaborne@epimorphics.com, nathan@webr3.org, alexhall@revelytix.com, richard@cyganiak.de, public-rdf-wg@w3.org
* [2011-04-04 08:17:10 -0400] Peter Frederick Patel-Schneider <pfps@research.bell-labs.com> écrit: ] I would like to see this sort of argument backed up with numbers ] including all costs, such as I/O. Ideally, such arguments should come ] with code, so that the quality of the implementation can be checked. Ok, a quick test, using lcsh-20110104.nt, which you can get from the LOC website and contains 4256000 statements. The tests below are done with rapper which is part of the raptor package, written in C, publicly available free software, probably one of the better implementations out there, and just does serialisation and parsing. time rapper -i ntriples -o ntriples lcsh-20110104.nt > /dev/null real 1m2.613s user 1m0.105s sys 0m1.509s time rapper -i ntriples -o turtle lcsh-20110104.nt > /dev/null real 3m16.250s user 2m34.078s sys 0m18.509s time rapper -i turtle -o ntriples lcsh-20110104.ttl > /dev/null real 1m50.161s user 1m34.954s sys 0m13.135s time rapper -i turtle -o turtle lcsh-20110104.ttl > /dev/null (memory exhausted, sorry) When working with turtle, the size of the process gets to be quite large and suggests a significant part, perhaps all of the file is held in RAM. In either case, serialising and parsing, the process ends up taking about 800Mb and trying to do both likely would mean double that, which is more free memory than my computer has. There is probably room for improvement in the turtle parser / serialiser, but quite obviously it is easy to make a streaming ntriples serialiser and parser and harder to make one for turtle, otherwise it would have been done. We can't count on their existence for turtle but it seems reasonable to expect them for ntriples. That is probably the main reason why ntriples is preferred for dumps of large datasets. Now it is hard to imagine that a turtle parser optimised for lower memory use would be slower than one that just read everything into RAM and munged it. The rough measurements above show the turtle parser to be almost twice as slow as the ntriples one despite it not being optimised for memory at all. Cheers, -w -- William Waites <mailto:ww@styx.org> http://river.styx.org/ww/ <sip:ww@styx.org> F4B3 39BF E775 CF42 0BAB 3DF0 BE40 A6DF B06F FD45
Received on Monday, 4 April 2011 13:12:16 UTC