Are current RDF tools ready for this use case? from Mark Kennedy on 2007-06-26 (semantic-web@w3.org from June 2007)

From: Mark Kennedy <MKennedy@fool.com>
Date: Tue, 26 Jun 2007 13:28:11 -0400
To: <semantic-web@w3.org>
Message-ID: <389C731CCF50114BACE90427AF7D7B070107AC38@HQPOST.hq.fool.net>

Hello, all:

 

I'm hoping to get some feedback for the appropriateness of using RDF as
a solution to a challenge that I'm facing. Here's the scenario:

 

I have an environment that includes several disparate sets of content,
all maintained and stored by separate systems. For example, we have a
CMS for our editorial content, a 3rd party blogging tool, a message
board application, a 3rd party video management system, and perhaps a
3rd party wiki application. Each of those systems has their own schema
and storage for metadata about the content they manage. In the future,
new systems and content types will be added to our environment as
needed.

 

Our vision is to build a common metadata store that each separate system
would feed into. This common store would enable us to rectify the
metadata from each system into a common schema, e.g. allow us to map the
author information from each separate system onto a common set of
authors, etc.

 

Our goal is to be able to query the common metadata store to do things
like find all of the content created by a single author regardless of
the system, or find all content related to a particular topic, or some
similar combination of query criteria.

 

Based on our requirements, RDF seems like an ideal solution. What I'm
unsure about, however, is if there are any RDF tools/frameworks/stores
that are robust enough to handle a high level of concurrent querying
that would result from a high traffic, publicly available web site.

 

I'm just starting the process of researching tools and triple stores
now, but I guess I'm looking for a gut check on the readiness or
appropriateness of RDF to serve the needs I describe. Are RDF and the
current tools that enable/support it ready for prime-time consideration?
If so, which ones make the most sense to research first?

 

In my mind, the ideal system would support:

*	The ability to store large numbers of triples, scalable to
hundreds of millions.
*	Would be clusterable for redundancy.
*	Could be accessed via HTTP for easy integration into a variety
of platforms.
*	Would be highly performant in regards to querying.

 

Any feedback would be appreciated. And if you think this query might
make more sense in another forum, please let me know.

 

Thanks!

 

Mark Kennedy

mkennedy@fool.com

Received on Wednesday, 27 June 2007 18:12:14 UTC