Are current RDF tools ready for this use case? from Mark Kennedy on 2007-06-27 (semantic-web@w3.org from June 2007)

From: Mark Kennedy <MKennedy@fool.com>
Date: Wed, 27 Jun 2007 09:53:52 -0400
To: <semantic-web@w3.org>
Message-ID: <389C731CCF50114BACE90427AF7D7B070107AC62@HQPOST.hq.fool.net>

Hello, all:

I'm hoping to get some feedback for the appropriateness of using RDF as
a solution to a challenge that I'm facing. Here's the scenario:

I have an environment that includes several disparate sets of content,
all maintained and stored by separate systems. For example, we have a
CMS for our editorial content, a third party blogging tool, a message
board application, a third party video management system, and perhaps a
third party wiki application. Each of those systems has their own schema
and storage for metadata about the content they manage. In the future,
new systems and content types will be added to our environment as
needed.

Our vision is to build a common metadata store that each separate system
would feed into. This common store would enable us to add new metadata
to content and rectify the metadata from each system into a common
schema, e.g. allow us to map the author information from each separate
system onto a common set of authors, map separate categorization schemes
to a common taxonomy, etc.

Our goal is to be able to query the common metadata store to do things
like find all of the content created by a single author regardless of
the system, or find all content related to a particular topic, or some
similar combination of query criteria.

Based on our requirements, RDF seems like an ideal solution. What I'm
unsure about, however, is if there are any RDF tools/frameworks/stores
that are robust enough to handle a high level of concurrent querying
that would result from a high traffic, publicly available web site.

I'm just starting the process of researching tools and triple stores
now, but I guess I'm looking for a gut check on the readiness or
appropriateness of RDF to serve the needs I describe. Are RDF and the
current tools that enable/support it ready for prime-time consideration?
If so, which ones make the most sense to research first?

In my mind, the ideal system would support:
 * The ability to store large numbers of triples, scalable to hundreds
of millions.
 * Would be clusterable for redundancy.
 * Could be accessed via HTTP for easy integration into a variety of
platforms.
 * Would be highly performant in regards to querying.

Any feedback would be appreciated. And if you think this query might
make more sense in another forum, please let me know.

Thanks!

Mark Kennedy
mkennedy@fool.com

Received on Wednesday, 27 June 2007 18:12:14 UTC