Re: Are current RDF tools ready for this use case? from Chris Bizer on 2007-06-27 (semantic-web@w3.org from June 2007)

From: Chris Bizer <chris@bizer.de>
Date: Wed, 27 Jun 2007 20:25:07 +0200
To: "Mark Kennedy" <MKennedy@fool.com>
Cc: <semantic-web@w3.org>
Message-ID: <001201c7b8e8$7d2c09b0$c4e84d57@named4gc1asnuj>

Hi Mark,

> What I'm unsure about, however, is if there are any RDF tools/frameworks/stores that are
> robust enough to handle a high level of concurrent querying that would result from a high traffic,
> publicly available web site.

We use OpenLink Virtuoso http://virtuoso.openlinksw.com/wiki/main/ within the DBpedia project to handle concurrent queries from the Web against a dataset of 90 million triples which works fine.

You can test the store with the SPARQL query builders provided on the DBpedia project page http://dbpedia.org/docs/

Cheers

Chris

--
Chris Bizer
Freie Universität Berlin
+49 30 838 54057
chris@bizer.de
www.bizer.de
----- Original Message -----
From: Mark Kennedy
To: semantic-web@w3.org
Sent: Tuesday, June 26, 2007 7:28 PM
Subject: Are current RDF tools ready for this use case?

Hello, all:

I'm hoping to get some feedback for the appropriateness of using RDF as a solution to a challenge that I'm facing. Here's the scenario:

I have an environment that includes several disparate sets of content, all maintained and stored by separate systems. For example, we have a CMS for our editorial content, a 3rd party blogging tool, a message board application, a 3rd party video management system, and perhaps a 3rd party wiki application. Each of those systems has their own schema and storage for metadata about the content they manage. In the future, new systems and content types will be added to our environment as needed.

Our vision is to build a common metadata store that each separate system would feed into. This common store would enable us to rectify the metadata from each system into a common schema, e.g. allow us to map the author information from each separate system onto a common set of authors, etc.

Our goal is to be able to query the common metadata store to do things like find all of the content created by a single author regardless of the system, or find all content related to a particular topic, or some similar combination of query criteria.

Based on our requirements, RDF seems like an ideal solution. What I'm unsure about, however, is if there are any RDF tools/frameworks/stores that are robust enough to handle a high level of concurrent querying that would result from a high traffic, publicly available web site.

I'm just starting the process of researching tools and triple stores now, but I guess I'm looking for a gut check on the readiness or appropriateness of RDF to serve the needs I describe. Are RDF and the current tools that enable/support it ready for prime-time consideration? If so, which ones make the most sense to research first?

In my mind, the ideal system would support:

a.. The ability to store large numbers of triples, scalable to hundreds of millions.
b.. Would be clusterable for redundancy.
c.. Could be accessed via HTTP for easy integration into a variety of platforms.
d.. Would be highly performant in regards to querying.

Any feedback would be appreciated. And if you think this query might make more sense in another forum, please let me know.

Thanks!

Mark Kennedy

mkennedy@fool.com

Received on Wednesday, 27 June 2007 18:25:22 UTC