Re: Tools for 20 million triples? from Phil Dawes on 2004-03-26 (www-rdf-interest@w3.org from March 2004)

From: Phil Dawes <pdawes@users.sf.net>
Date: Fri, 26 Mar 2004 15:34:21 +0000
To: Andrew Newman <andrew@tucanatech.com>
Cc: RDF Interest <www-rdf-interest@w3.org>
Message-ID: <16484.19837.45650.443587@gargle.gargle.HOWL>

Hi Andrew,

Is it possible to do optional query clauses in Kowari?  
E.g. match a resource by a set of criteria, and also return the
rdfs:label of the resource if it exists. 
(it wasn't obvious to me from the documentation)

Many thanks,

Phil


Andrew Newman writes:
 > 
 > I would be remiss in my duties not to mention our Java triple stores 
 > Kowari and TKS.  Our current single system has been tested to handle 
 > around 215 million triples so that gives you plenty of room to grow. 
 > The iTQL query layer in TKS also has the feature to query multiple data 
 > sources at once so you could scale up that way too.
 > 
 > The currently available CVS version of Kowari 
 > (http://sf.net/projects/kowari) can do 20 million triples in about an 1 
 > hour 10 minutes on an Opteron 240 (1.4 GHz).  We use mapped I/O for 64 
 > bit systems like Opteron or Sun systems.
 > 
 > For 32 bit systems (like a Pentium 4) there's some limitations which 
 > we've been working on.  With mapped I/O you soon reach a limit, at about 
 > 3-4 million triples, and explicit only loads at about 800 
 > triples/second.  Explicit I/O has no practical limit (except for time 
 > and the number of longs) to the number of triples you can add.
 > 
 > The current CVS version also has Jena, RDQL support and JRDF interfaces.
 > 
 > Our current internal development version loads 20 million triples in an 
 > hour on the same Opteron system.  On the same system, we're loading 200 
 > million triples at a rate of 2,100 triples/second.  The 32 bit system 
 > load time is now around 3 times faster giving it about 2,500 
 > triples/second.  That would load your data in around 2 hours.
 > 
 > We've also got some other changes that may give us more speed 
 > improvements - especially over large data sets.
 > 
 > We're now getting a nice mix of I/O and CPU bound behaviour and being 
 > bound to things outside our system like the ARP parser.
 > 
 > We plan to release that in the next few weeks or so.
 > 
 > 

--

Received on Friday, 26 March 2004 10:36:09 UTC