Are current RDF tools ready for this use case? from Mark Kennedy on 2007-06-27 (semantic-web@w3.org from June 2007)

From: Mark Kennedy <mark.kennedy@gmail.com>
Date: Wed, 27 Jun 2007 13:59:45 -0400
To: semantic-web@w3.org
Message-ID: <eea61f00706271059n26c5437cqd104728771f9bffb@mail.gmail.com>

Hello, all:

I'm hoping to get some feedback for the appropriateness of using RDF as a
solution to a challenge that I'm facing. Here's the scenario:

I have an environment that includes several disparate sets of content, all
maintained and stored by separate systems. For example, we have a CMS for
our editorial content, a third party blogging tool, a message board
application, a third party video management system, and perhaps a third
party wiki application. Each of those systems has their own schema and
storage for metadata about the content they manage. In the future, new
systems and content types will be added to our environment as needed.

Our vision is to build a common metadata store that each separate system
would feed into. This common store would enable us to add new metadata to
content and rectify the metadata from each system into a common schema, e.g.
allow us to map the author information from each separate system onto a
common set of authors, map separate categorization schemes to a common
taxonomy, etc.

Our goal is to be able to query the common metadata store to do things like
find all of the content created by a single author regardless of the system,
or find all content related to a particular topic, or some similar
combination of query criteria.

Based on our requirements, RDF seems like an ideal solution. What I'm unsure
about, however, is if there are any RDF tools/frameworks/stores that are
robust enough to handle a high level of concurrent querying that would
result from a high traffic, publicly available web site.

I'm just starting the process of researching tools and triple stores now,
but I guess I'm looking for a gut check on the readiness or appropriateness
of RDF to serve the needs I describe. Are RDF and the current tools that
enable/support it ready for prime-time consideration? If so, which ones make
the most sense to research first?

In my mind, the ideal system would support:
 * The ability to store large numbers of triples, scalable to hundreds of
millions.
 * Would be clusterable for redundancy.
 * Could be accessed via HTTP for easy integration into a variety of
platforms.
 * Would be highly performant in regards to querying.

Any feedback would be appreciated. And if you think this query might make
more sense in another forum, please let me know.

Thanks!

-- 
Mark Kennedy

Received on Wednesday, 27 June 2007 18:12:14 UTC