Re: RDF tools as workhorse from Jan Algermissen on 2005-09-14 (semantic-web@w3.org from September 2005)

From: Jan Algermissen <jalgermissen@topicmapping.com>
Date: Wed, 14 Sep 2005 08:13:09 +0200
To: Mailing Lists <list@thirdstation.com>
Cc: semantic-web@w3.org
Message-Id: <B00D374D-A748-4C4B-9BFD-6B0D020716F5@topicmapping.com>

Hi Mark,

On Sep 13, 2005, at 10:46 PM, Mailing Lists wrote:

>
> Hi all,
>
> Does anyone on the list have some real-world stories to share about  
> using RDF and its tools as a backend technology?  The company I  
> work for maintains a database of metadata.  I'd like to explore  
> using RDF instead of our current schemas.

I use redland+MySQL as the backend for a mid-size configuration  
managament database. It works well (though I am still missing  
transactions) and is supposed to scale. I also store HTML-page size  
literals in the database without problems.

>
> For example:   I have a lot of data about books.  I'd like to  
> translate the data into RDF/XML and dump it into an RDF database.   
> Then, taking a particular book, I'd like to query the database to  
> extract related information like: other books by the same author,  
> other books with the same subject code, etc.
>

All that should work fine, as long as you do not perform searches  
over the literals other than by exact string match. These will  
usually result in a full scan of the literals table giving you bad  
performance for reasonably large data sets.

> My concerns relate to:
> 1) Performance -- Right now we query the database using SQL.   
> Sometimes it is _very_ slow.  That's mainly because the data is  
> distributed across tables and there are a lot of joins going on.   
> It seems like using RDF would allow us to use simple queries.

That would be an interesting case for RDF - can you extend on that a  
bit?
>
> 2) Scalability -- Our triplestore would be HUGE.  I'd estimate  
> 10-20 Million triples.  Is that small or large in RDF circles?

redland+MySQL is said to scale for millions.
>
> 3) Productivity -- It's usually easier for me to envision creating  
> RDF from our source data than massaging the data to fit into our  
> database schema.  The same goes for when I'm extracting data - it  
> seems like it would be much easier to express my query as a triple  
> using wildcards for the data I want.
>
> Any information will be helpful.  I'm interested in learning from  
> other peoples' experiences.
>

IMHO, the issue of search in RDF databases is not so critical,  
because if you consequently apply Web technologies to enterprise IT,  
you very likely end up in a situation where you have multiple tripple  
databases anyhow. Instead of any distribted search over all the  
databases, the Web style solution would be a central crawling+search  
service (a search engine). These are optimized for the kinds of  
queries that are poorly served by RDF tripple stores.

IOW, the whole issue of performance suddenly disappears.

HTH,

Jan

> Thanks,
> Mark
>
> ..oO  Mark Donoghue
> ..oO  e: mark@ThirdStation.com
> ..oO  doi: http://dx.doi.org/10.1570/m.donoghue
>
>
>
>

________________________________________________________________________ 
_______________
Jan Algermissen, Consultant & Programmer                         
http://jalgermissen.com
Tugboat Consulting, 'Applying Web technology to enterprise IT'   
http://www.tugboat.de

Received on Wednesday, 14 September 2005 06:13:23 UTC