Re: Looking for the right RDF store(s)

I'll put in a vote for Virtuoso [1].  Its open source edition is truly
free software (GPL), and it's a top performer in most publicly-released
benchmarks that I've seen [2].

Since you mention wanting to support applications that are on the order of
DBpedia, it seems only natural to give strong consideration to Virtuoso
(as the triplestore that powers DBpedia).  I can also say that in our
project (OpenEI.org [3]), Virtuoso has proven to be a nice complement to
Semantic MediaWiki.  We currently accomplish this by importing a nightly
RDF dump of the SMW data into Virtuoso (which then powers SPARQL queries
and LOD views of that data).  We'd love to switch to more of a real-time
flow of data from SMW into our triplestore, though.  So we're very
interested in your work.

In my understanding, a couple of the soft requirements you note (like
clustering and geographic queries) would really be handled by the
commercial edition of Virtuoso.  But at least that exists as an option for
those who need it.  Hopefully someone from the Virtuoso team can jump in
with more of a point-by-point response to your requirements.  In general,
I'm just piping up to say that our project is making extensive use of SMW
and Virtuoso together as a solution already, and we'd love to see that
pairing become stronger.

--Jamey

1: http://virtuoso.openlinksw.com/dataspace/dav/wiki/Main/
2: http://www.w3.org/wiki/LargeTripleStores#Benchmarks_data_sources
3: http://en.openei.org/

On 2/24/11 9:28 AM, "Markus Krötzsch" <markus.kroetzsch@comlab.ox.ac.uk>
wrote:

>Hi,
>
>Some installations of Semantic MediaWiki have become quite big, and our
>users are looking for faster storage backends (than MySQL!) for query
>answering. We will provide RDF store bindings via SPARQL Update, but we
>are unsure which RDF stores to recommend to our users. Input is
>appreciated (also let me know if I should rather take this to a more
>specific list).
>
>SPARQL Update is mandatory. The following are softer requirements:
>
>(1) Free software version available (crucial to many of our users,
>essential if we are to make it our default store),
>(2) Robust & reliable for medium to large datasets
>(3) Good handling of many concurrent queries (of controlled complexity)
>(4) Good handling of continuous updates (some update lag is acceptable,
>but the store should not impose update scheduling on the user)
>(5) Good support for datatype-related queries (numerical and
>lexicographic sorting, ideally also distance queries for geographic
>data, ideally some forms of string pattern matching)
>(6) Options for scaling out; but without being obliged to start with a
>multi-node server cluster.
>
>I am aware that the above requirements are not specific -- indeed the
>details vary widely across our applications (from 10 users and millions
>of pages to tenth of thousands of users and pages). Single-user query
>performance and top speed for highly complex queries are not of much
>interest, but robustness and reliability is. We consider some
>applications in the order of DBPedia En but this is not typical. But
>cases with some 10 Mio triples should be covered. Of course,
>well-equipped servers (RAID, SSD-based storage, loads of RAM, etc.) can
>be assumed.
>
>What we are looking for are good candidates to recommend in general,
>knowing that users will still need to pick the optimal solution for
>their individual data sets. What we can offer to RDF store suppliers is
>significant visibility in various user communities (e.g., our biggest
>web user is Wikia, hosting about 30.000 wiki communities; SMW users in
>industry would also appreciate more sophisticated storage solutions).
>
>Thanks,
>
>Markus
>
>-- 
>Dr. Markus Krötzsch
>Oxford  University  Computing  Laboratory
>Room 306, Parks Road, Oxford, OX1 3QD, UK
>+44 (0)1865 283529    http://korrekt.org/
>

Received on Friday, 25 February 2011 17:43:34 UTC