- From: Andrew Newman <andrew@tucanatech.com>
- Date: Fri, 26 Mar 2004 09:35:59 +1000
- To: RDF Interest <www-rdf-interest@w3.org>
Charles McCathieNevile wrote: > Hi folks, > > on another list someone asked what tools would be good for handling an > OWL ontology of about 25,000 terms, with around 20 million triples. There > were a > handful of ideas about how to build specialised SQL systems or similar, but > Danny Ayers pointed out that there are systems capable of handling RDF and a > lot of triples (which by lucky chance happens to be a way of storing OWL). > > So I wondered if anyone on this list had experience of tools working with > this size dataset. (I will read Dave Beckett's report done for SWAD-Europe on > the topic, but I suspect that there is already new information available, and > would like to be up to date). > > Cheers > > Chaals > I would be remiss in my duties not to mention our Java triple stores Kowari and TKS. Our current single system has been tested to handle around 215 million triples so that gives you plenty of room to grow. The iTQL query layer in TKS also has the feature to query multiple data sources at once so you could scale up that way too. The currently available CVS version of Kowari (http://sf.net/projects/kowari) can do 20 million triples in about an 1 hour 10 minutes on an Opteron 240 (1.4 GHz). We use mapped I/O for 64 bit systems like Opteron or Sun systems. For 32 bit systems (like a Pentium 4) there's some limitations which we've been working on. With mapped I/O you soon reach a limit, at about 3-4 million triples, and explicit only loads at about 800 triples/second. Explicit I/O has no practical limit (except for time and the number of longs) to the number of triples you can add. The current CVS version also has Jena, RDQL support and JRDF interfaces. Our current internal development version loads 20 million triples in an hour on the same Opteron system. On the same system, we're loading 200 million triples at a rate of 2,100 triples/second. The 32 bit system load time is now around 3 times faster giving it about 2,500 triples/second. That would load your data in around 2 hours. We've also got some other changes that may give us more speed improvements - especially over large data sets. We're now getting a nice mix of I/O and CPU bound behaviour and being bound to things outside our system like the ARP parser. We plan to release that in the next few weeks or so.
Received on Thursday, 25 March 2004 18:36:05 UTC