Looking for the right RDF store(s)


Some installations of Semantic MediaWiki have become quite big, and our 
users are looking for faster storage backends (than MySQL!) for query 
answering. We will provide RDF store bindings via SPARQL Update, but we 
are unsure which RDF stores to recommend to our users. Input is 
appreciated (also let me know if I should rather take this to a more 
specific list).

SPARQL Update is mandatory. The following are softer requirements:

(1) Free software version available (crucial to many of our users, 
essential if we are to make it our default store),
(2) Robust & reliable for medium to large datasets
(3) Good handling of many concurrent queries (of controlled complexity)
(4) Good handling of continuous updates (some update lag is acceptable, 
but the store should not impose update scheduling on the user)
(5) Good support for datatype-related queries (numerical and 
lexicographic sorting, ideally also distance queries for geographic 
data, ideally some forms of string pattern matching)
(6) Options for scaling out; but without being obliged to start with a 
multi-node server cluster.

I am aware that the above requirements are not specific -- indeed the 
details vary widely across our applications (from 10 users and millions 
of pages to tenth of thousands of users and pages). Single-user query 
performance and top speed for highly complex queries are not of much 
interest, but robustness and reliability is. We consider some 
applications in the order of DBPedia En but this is not typical. But 
cases with some 10 Mio triples should be covered. Of course, 
well-equipped servers (RAID, SSD-based storage, loads of RAM, etc.) can 
be assumed.

What we are looking for are good candidates to recommend in general, 
knowing that users will still need to pick the optimal solution for 
their individual data sets. What we can offer to RDF store suppliers is 
significant visibility in various user communities (e.g., our biggest 
web user is Wikia, hosting about 30.000 wiki communities; SMW users in 
industry would also appreciate more sophisticated storage solutions).



Dr. Markus Krötzsch
Oxford  University  Computing  Laboratory
Room 306, Parks Road, Oxford, OX1 3QD, UK
+44 (0)1865 283529    http://korrekt.org/

Received on Thursday, 24 February 2011 16:28:58 UTC