Re: RDF tools as workhorse

Hi Mark.  I have also been looking at this approach with same type of 
data, however RDF triples stores are generally a fairly slow compared 
to SQL (and when dealing with large graph issue can also be just 
getting it into memory to query it).  The approach I am working with to 
solve this involves storing  triples store in an SQL database as hashes 
and related tables - then querying the tables directly with SQL queries 
(faster queries) or using SPARQL on the triples store (which could be 
pretty slow on Literals).  My project won't be ready for a bit but it 
will use a combination of these strategies. I am aiming at about the 
same scale you are considering.  10 - 20 million triples is large but 
not huge I would say.

I think overall, SQL databases have been around for quite some time and 
tools to work on metadata need to evolve a bit to get close to the 
speed.  I am working with Python and there are a number of 
possibilities for backends but so far, my choice has been SQL since it 
can take advantage of SQL's power and full text searching in the 
database.  I am using Postgres. MySQL might be a bit faster but does 
not have same functionality as a database in my view.

My understanding of pure triples stores is that the only other way to 
obtain reasonable performance is to break a graph into subgraphs .  It 
is interesting to me that the search engines that scrape the web for 
information and collect it up as rdf use more traditional tools like 
that index the text as opposed to using the triples in a triples store 
to query the rich information.  I don't know if this helps but I am 
working through the same issues and I am learning by doing and from 
experimentation.

Regards,
David

On Tuesday, September 13, 2005, at 05:46 PM, Mailing Lists wrote:

>
> Hi all,
>
> Does anyone on the list have some real-world stories to share about 
> using RDF and its tools as a backend technology?  The company I work 
> for maintains a database of metadata.  I'd like to explore using RDF 
> instead of our current schemas.
>
> For example:   I have a lot of data about books.  I'd like to 
> translate the data into RDF/XML and dump it into an RDF database.  
> Then, taking a particular book, I'd like to query the database to 
> extract related information like: other books by the same author, 
> other books with the same subject code, etc.
>
> My concerns relate to:
> 1) Performance -- Right now we query the database using SQL.  
> Sometimes it is _very_ slow.  That's mainly because the data is 
> distributed across tables and there are a lot of joins going on.  It 
> seems like using RDF would allow us to use simple queries.
>
> 2) Scalability -- Our triplestore would be HUGE.  I'd estimate 10-20 
> Million triples.  Is that small or large in RDF circles?
>
> 3) Productivity -- It's usually easier for me to envision creating RDF 
> from our source data than massaging the data to fit into our database 
> schema.  The same goes for when I'm extracting data - it seems like it 
> would be much easier to express my query as a triple using wildcards 
> for the data I want.
>
> Any information will be helpful.  I'm interested in learning from 
> other peoples' experiences.
>
> Thanks,
> Mark
>
> ..oO  Mark Donoghue
> ..oO  e: mark@ThirdStation.com
> ..oO  doi: http://dx.doi.org/10.1570/m.donoghue
>
>

Received on Wednesday, 14 September 2005 06:31:24 UTC