Re: SIMILE Store Report from Ryan Lee on 2004-07-29 (www-rdf-interest@w3.org from July 2004)

From: Ryan Lee <ryanlee@w3.org>
Date: Thu, 29 Jul 2004 12:44:33 -0400
To: Jeen Broekstra <jeen@aduna.biz>
Cc: www-rdf-dspace@w3.org, www-rdf-interest@w3.org
Message-ID: <41092971.80805@w3.org>
> I'll echo Steve, an interesting read, and useful, there should be more
> public reports like this. Thanks for sharing.

Thanks for sharing your comments.

> I have a few remarks:
> 
> - It is mentioned that you use RDQL for querying Sesame. I'd like to
>   point out that the RDQL engine in Sesame is not optimized for use
>   with MySQL/PostgreSQL, whereas the SeRQL engine is. Most of the time
>   SeRQL queries will perform significantly better than their RDQL
>   counterpart in Sesame.

I'll check it out.

> - It seems to me that the terms 'local', 'network', 'in memory' and
>   'persistent' are used rather loosely. It is worth pointing out that
>   these terms actually are orthogonal dimensions: local and network
>   stores can both be either in-memory or persistent.

Thanks, I'll dig around and see what needs correcting or clarifying.

> - One of your requirements is that the server runs on the network,
>   presumably on a dedicated machine, yet you dismiss the use of
>   in-memory stores for scalability reasons. I have no precise idea of
>   your ultimate scalability requirements, but with
>   sufficient iron in-memory stores can go a long way. For
>   example, I know of a group that uses Sesame's in-memory store for a
>   dataset consisting of 15 million triples, and apparantly that works
>   comfortably.

We're thinking along the lines of 200 million triples, ultimately, 
though I'm not sure if we'll have enough pertinent data to reach that 
mark.  Nor am I sure if it will all reside in one store, our 
architecture plan calls for multiple stores scattered here and there 
across the network.

I don't think we at SIMILE can necessarily count on our consumers having 
the hardware to run with only an all in-memory solution.  While I guess 
we can't truly count on them having copious disk space either, it seems 
to me that disk-based persistent storage is a more general solution.

> - I noticed that you didn't test Sesame's RMI interface. Was this due
>   to time constraints, or because of problems with it?

Time constraints, unfortunately.

> Several remarks that were made by Steve and Andrew about suboptimal
> coding/querying and network overhead probably also hold for Sesame,
> but I suspect that these hold for all tools involved, so I'll just be 
> optimistic and assume that that more or less levels out.
 >
> Last but not least, a bit of a plug: we are developing a native
> persistent storage backend for Sesame[1]. The design goals are high
> scalibility yet performance comparable to the in-memory store. And
> while we're at it, we're also working on a solution to global warming ;)

I'd be very interested to hear about your patch for global warming.  Oh, 
and that native store thing might be of interest as well.  Is there a 
projected timeline (for either :)?

> Jeen
> 
> [1] See http://www.openrdf.org/forum/mvnforum/viewthread?thread=179


-- 
Ryan Lee                 ryanlee@w3.org
W3C Research Engineer    +1.617.253.5327
http://simile.mit.edu/
Received on Thursday, 29 July 2004 12:45:16 UTC