Re: RDF tools as workhorse from Richard Newman on 2005-09-14 (semantic-web@w3.org from September 2005)

From: Richard Newman <holygoat@gmail.com>
Date: Tue, 13 Sep 2005 23:31:17 -0700
To: Mailing Lists <list@thirdstation.com>
Cc: semantic-web@w3.org
Message-Id: <D7D6C30E-2E38-481C-A7E8-A8F5B2638323@gmail.com>

> Does anyone on the list have some real-world stories to share about  
> using RDF and its tools as a backend technology?  The company I  
> work for maintains a database of metadata.  I'd like to explore  
> using RDF instead of our current schemas.

I do, but I'm not sure if I can talk about them, though :D

I will say I'm having great success.

> For example:   I have a lot of data about books.  I'd like to  
> translate the data into RDF/XML and dump it into an RDF database.   
> Then, taking a particular book, I'd like to query the database to  
> extract related information like: other books by the same author,  
> other books with the same subject code, etc.
>
> My concerns relate to:
> 1) Performance -- Right now we query the database using SQL.   
> Sometimes it is _very_ slow.  That's mainly because the data is  
> distributed across tables and there are a lot of joins going on.   
> It seems like using RDF would allow us to use simple queries.

Possibly... the queries might be simple, but that doesn't necessarily  
mean they'll be performant.

> 2) Scalability -- Our triplestore would be HUGE.  I'd estimate  
> 10-20 Million triples.  Is that small or large in RDF circles?

That's fairly large. Leigh Dodds will have something to say on this  
topic, having used very large sets of triples.

The largest I've considered doing in my current work has been around  
70 million records (IIRC), each expanding to probably 20-30 triples.  
I'm yet to actually _try_ it, though -- especially with an in-memory  
store :)

(A reason to do so is that it might be the biggest ever!)

> 3) Productivity -- It's usually easier for me to envision creating  
> RDF from our source data than massaging the data to fit into our  
> database schema.  The same goes for when I'm extracting data - it  
> seems like it would be much easier to express my query as a triple  
> using wildcards for the data I want.

The development cycle is vastly more flexible -- I'm able to do  
things with RDF that would take a lot of effort using a RDB. Likewise  
for queries -- I use Wilbur's path expressions extensively, and I  
have SPARQL and standard triple-pattern queries to fall back on.

Having done a lot of development, I would now find it hard to ever  
use a relational database again -- for flexibility of development,  
RDF wins, and for modelling the domain, extending, integrating, and  
complex queries it's also hands-down the best choice.

-R

Received on Wednesday, 14 September 2005 06:31:36 UTC