Re: Looking for the right RDF store(s)

Hi Markus,

At the risk of being accused of self-publicity, I'm going to recommend Garlik's store, 4store.

It did well in the recent BSBM benchmark: - 2nd in import, 2nd in query, 1st in update performance.

It's released under the GPL (v3), and supports clustering out of the box. It was used to run our financial services data backend for a large number of users, and some very demanding clients for several years before being released under the GPL, so it has a good track record for scalability and reliability.

As a bonus the clustered code and non-clustered code is identical — a single machine install is just a cluster of one. This hopefully minimises the nasty shocks when scaling out to clusters.

Now it's been released it has an active user community working on improvements, and providing help and support.

The downside is that it does not support ranged datatype searches, or geographic searches. However, we are talking to a partner that is interested in sponsoring geographic query support.

It's a little off-topic, so feel free to contact me offlist if you want more specific information.


Steve Harris, CTO, Garlik Limited
1-3 Halford Road, Richmond, TW10 6AW, UK
+44 20 8439 8203
Registered in England and Wales 535 7233 VAT # 849 0517 11
Registered office: Thames House, Portsmouth Road, Esher, Surrey, KT10 9AD

On 2011-02-24, at 16:28, Markus Krötzsch wrote:

> Hi,
> Some installations of Semantic MediaWiki have become quite big, and our users are looking for faster storage backends (than MySQL!) for query answering. We will provide RDF store bindings via SPARQL Update, but we are unsure which RDF stores to recommend to our users. Input is appreciated (also let me know if I should rather take this to a more specific list).
> SPARQL Update is mandatory. The following are softer requirements:
> (1) Free software version available (crucial to many of our users, essential if we are to make it our default store),
> (2) Robust & reliable for medium to large datasets
> (3) Good handling of many concurrent queries (of controlled complexity)
> (4) Good handling of continuous updates (some update lag is acceptable, but the store should not impose update scheduling on the user)
> (5) Good support for datatype-related queries (numerical and lexicographic sorting, ideally also distance queries for geographic data, ideally some forms of string pattern matching)
> (6) Options for scaling out; but without being obliged to start with a multi-node server cluster.
> I am aware that the above requirements are not specific -- indeed the details vary widely across our applications (from 10 users and millions of pages to tenth of thousands of users and pages). Single-user query performance and top speed for highly complex queries are not of much interest, but robustness and reliability is. We consider some applications in the order of DBPedia En but this is not typical. But cases with some 10 Mio triples should be covered. Of course, well-equipped servers (RAID, SSD-based storage, loads of RAM, etc.) can be assumed.
> What we are looking for are good candidates to recommend in general, knowing that users will still need to pick the optimal solution for their individual data sets. What we can offer to RDF store suppliers is significant visibility in various user communities (e.g., our biggest web user is Wikia, hosting about 30.000 wiki communities; SMW users in industry would also appreciate more sophisticated storage solutions).
> Thanks,
> Markus
> -- 
> Dr. Markus Krötzsch
> Oxford  University  Computing  Laboratory
> Room 306, Parks Road, Oxford, OX1 3QD, UK
> +44 (0)1865 283529

Received on Friday, 25 February 2011 10:02:56 UTC