RE: RDF tools as workhorse from Chris Wilper on 2005-09-14 (semantic-web@w3.org from September 2005)

From: Chris Wilper <cwilper@cs.cornell.edu>
Date: Wed, 14 Sep 2005 03:13:12 -0400
To: "Mailing Lists" <list@thirdstation.com>, <semantic-web@w3.org>
Message-ID: <772EF7E386FEDF4FA6E11A9DA703A4D24E01D9@EXCHVS1.cs.cornell.edu>

Hi Mark,

I'd suggest you take a good look at Kowari.  It really excels
at query performance and scalability compared to anything
else I've seen in this space.  My own testing has been in the
10-20M triple range.  I've heard that Kowari can easily 
handle ten times that, but haven't tested the assertion for 
myself.  Shoot me an email if you're interested and I'd be 
glad to share some more concrete data.

I think you are right on (if a bit bleeding edge) with your 
approach of moving metadata into a triplestore.  Even without 
inferencing, there are some really nice advantages to
going this route.  As more people realize these advantages, 
I think this space will see an increased focus on achieving
the kind of scale that the high-end relational databases
have seen for years.

Cheers,
Chris Wilper

-----Original Message-----
From: semantic-web-request@w3.org on behalf of Mailing Lists
Sent: Tue 9/13/2005 4:46 PM
To: semantic-web@w3.org
Subject: RDF tools as workhorse
 

Hi all,

Does anyone on the list have some real-world stories to share about 
using RDF and its tools as a backend technology?  The company I work 
for maintains a database of metadata.  I'd like to explore using RDF 
instead of our current schemas.

For example:   I have a lot of data about books.  I'd like to translate 
the data into RDF/XML and dump it into an RDF database.  Then, taking a 
particular book, I'd like to query the database to extract related 
information like: other books by the same author, other books with the 
same subject code, etc.

My concerns relate to:
1) Performance -- Right now we query the database using SQL.  Sometimes 
it is _very_ slow.  That's mainly because the data is distributed 
across tables and there are a lot of joins going on.  It seems like 
using RDF would allow us to use simple queries.

2) Scalability -- Our triplestore would be HUGE.  I'd estimate 10-20 
Million triples.  Is that small or large in RDF circles?

3) Productivity -- It's usually easier for me to envision creating RDF 
from our source data than massaging the data to fit into our database 
schema.  The same goes for when I'm extracting data - it seems like it 
would be much easier to express my query as a triple using wildcards 
for the data I want.

Any information will be helpful.  I'm interested in learning from other 
peoples' experiences.

Thanks,
Mark

..oO  Mark Donoghue
..oO  e: mark@ThirdStation.com
..oO  doi: http://dx.doi.org/10.1570/m.donoghue

Received on Wednesday, 14 September 2005 07:14:45 UTC