Re: [BioRDF] Scalability from Gary Schiltz on 2006-04-05 (public-semweb-lifesci@w3.org from April 2006)

From: Gary Schiltz <gss@ncgr.org>
Date: Wed, 05 Apr 2006 15:10:55 -0600
To: public-semweb-lifesci@w3.org
Message-ID: <4434325F.1010604@ncgr.org>

I haven't used it for RDF storage, but the page for SWI-Prolog's 
Semantic Web library (www.swi-prolog.org/packages/semweb.html) claims to 
have been "actively used with up to 10 million triples, using 
approximately 1GB of memory." I wonder if RAM is becoming faster/cheaper 
at a sufficiently fast rate to keep up with or outpace the growth of our 
databases of RDF triples - I suspect not.

Ora Lassila wrote:
> Matt,
>
> what kind of an in-memory database do you use? I have done some preliminary
> experiments with UniProt etc. data with about 2 million triples using our
> OINK browser (built using the Wilbur toolkit). Performance was very
> "interactive" (i.e., "snappy", notice my highly precise metrics here ;-) on
> a 1.67 GHZ Powerbook w/ 1 GB RAM.
>
> I don't think 2M triples is a limit on the above configuration, I just
> happened to use a dataset of such size. I will run bigger tests soon.
>
> One should also take into account that in my experiments I was running our
> RDF(S) reasoner also. It computes everything on-demand. Effectively there
> were therefore more than 2M triples. One observation is that RDF graphs
> often tend to have a higher fan-out going "backwards" than "forwards" (i.e.,
> when traversing arcs in the inverse direction); typical examples of such
> relations are rdf:type and rdfs:subClassOf. OINK supports inverse traversal.
>
> I'd like to know what kinds of datasets people are using, what kind of (RDF
> triple store) implementations they are using, and what are the observations
> about performance.
>
> Regards,
>
>     - Ora

Received on Wednesday, 5 April 2006 21:11:06 UTC