Re: SIMILE Store Report from Jeen Broekstra on 2004-07-29 (www-rdf-interest@w3.org from July 2004)

From: Jeen Broekstra <jeen@aduna.biz>
Date: Thu, 29 Jul 2004 15:13:24 +0200
To: Ryan Lee <ryanlee@w3.org>
Cc: www-rdf-dspace@w3.org, www-rdf-interest@w3.org
Message-ID: <4108F7F4.50706@aduna.biz>

Ryan Lee wrote:

> The SIMILE Project put together a report on the current state of
> triple store applications' performance at a medium scale as a
> preliminary step towards determining which store might fit our
> project needs best at a larger scale.
> 
> XHTML and PDF versions of the report can be found at
> 
> http://simile.mit.edu/reports/stores/
> 
> Feedback and comments are appreciated.

I'll echo Steve, an interesting read, and useful, there should be more
public reports like this. Thanks for sharing.

I have a few remarks:

- It is mentioned that you use RDQL for querying Sesame. I'd like to
   point out that the RDQL engine in Sesame is not optimized for use
   with MySQL/PostgreSQL, whereas the SeRQL engine is. Most of the time
   SeRQL queries will perform significantly better than their RDQL
   counterpart in Sesame.

- It seems to me that the terms 'local', 'network', 'in memory' and
   'persistent' are used rather loosely. It is worth pointing out that
   these terms actually are orthogonal dimensions: local and network
   stores can both be either in-memory or persistent.

- One of your requirements is that the server runs on the network,
   presumably on a dedicated machine, yet you dismiss the use of
   in-memory stores for scalability reasons. I have no precise idea of
   your ultimate scalability requirements, but with
   sufficient iron in-memory stores can go a long way. For
   example, I know of a group that uses Sesame's in-memory store for a
   dataset consisting of 15 million triples, and apparantly that works
   comfortably.

- I noticed that you didn't test Sesame's RMI interface. Was this due
   to time constraints, or because of problems with it?

Several remarks that were made by Steve and Andrew about suboptimal
coding/querying and network overhead probably also hold for Sesame,
but I suspect that these hold for all tools involved, so I'll just be 
optimistic and assume that that more or less levels out.

Last but not least, a bit of a plug: we are developing a native
persistent storage backend for Sesame[1]. The design goals are high
scalibility yet performance comparable to the in-memory store. And
while we're at it, we're also working on a solution to global warming ;)

Jeen

[1] See http://www.openrdf.org/forum/mvnforum/viewthread?thread=179
-- 
Jeen Broekstra          Aduna BV
Knowledge Engineer      Julianaplein 14b, 3817 CS Amersfoort
http://aduna.biz        The Netherlands
tel. +31(0)33 46599877  fax. +31(0)33 46599877

Received on Thursday, 29 July 2004 09:12:27 UTC