RE: RDF tools as workhorse from Geoff Chappell on 2005-09-14 (semantic-web@w3.org from September 2005)

From: Geoff Chappell <geoff@sover.net>
Date: Wed, 14 Sep 2005 07:18:53 -0400
To: "'Mailing Lists'" <list@thirdstation.com>, <semantic-web@w3.org>
Message-ID: <037301c5b91e$1ab95c40$6401a8c0@gsclaptop>

Hi Mark,

> -----Original Message-----
> From: semantic-web-request@w3.org [mailto:semantic-web-request@w3.org] On
> Behalf Of Mailing Lists
> Sent: Tuesday, September 13, 2005 4:47 PM
> To: semantic-web@w3.org
> Subject: RDF tools as workhorse
> 
> 
> Hi all,
> 
> Does anyone on the list have some real-world stories to share about
> using RDF and its tools as a backend technology?  The company I work
> for maintains a database of metadata.  I'd like to explore using RDF
> instead of our current schemas.
> 
> For example:   I have a lot of data about books.  I'd like to translate
> the data into RDF/XML and dump it into an RDF database.  Then, taking a
> particular book, I'd like to query the database to extract related
> information like: other books by the same author, other books with the
> same subject code, etc.
> 
> My concerns relate to:
> 1) Performance -- Right now we query the database using SQL.  Sometimes
> it is _very_ slow.  That's mainly because the data is distributed
> across tables and there are a lot of joins going on.  It seems like
> using RDF would allow us to use simple queries.
> 
> 2) Scalability -- Our triplestore would be HUGE.  I'd estimate 10-20
> Million triples.  Is that small or large in RDF circles?

As a real-world example of performance and scalability you might be
interested to check out some work with did with the Uniprot protein database
(262 million triples) and RDF Gateway. See:

http://labs.intellidimension.com/uniprot/default.rsp

for a description of the effort and some live example queries (including a
link to an experimental sparql query interface).

 > 3) Productivity -- It's usually easier for me to envision creating RDF
> from our source data than massaging the data to fit into our database
> schema.  The same goes for when I'm extracting data - it seems like it
> would be much easier to express my query as a triple using wildcards
> for the data I want.

One of the big benefits I find from working with RDF is the ability to
evolve/adapt your data as your project changes. As opposed to a relational
schema, which you're really forced to get right the first time because it's
typically pretty inflexible to change once an app is built around it, a rdf
schema is much more fluid and seems to allow for a more iterative
development model (particulary if your store supports inference that let's
you easily values as a schema changes).

> Any information will be helpful.  I'm interested in learning from other
> peoples' experiences.
> 
> Thanks,
> Mark
> 
> ..oO  Mark Donoghue
> ..oO  e: mark@ThirdStation.com
> ..oO  doi: http://dx.doi.org/10.1570/m.donoghue

Best,

Geoff Chappell

Received on Wednesday, 14 September 2005 11:19:19 UTC