RE: Sync'ing triplestores

> > I've not had a proper search yet, but was wondering if anyone had
any
> > pointers to approaches/algorithms for keeping separate triplestores
in
> > sync. Ideally between different implementations - e.g. Jena +
Redland.
> 
> Sorry, that wasn't very clear - by sync I mean having (potentially
> big) models replicated at remote locations.

I haven't found any good comprehensive prior art, but I have been
thinking about this a lot lately.  The general problem is merging models
(since if the models are disjoint, you don't have an issue).  And to
merge triples, you have to be able to tell whether two triples are
duplicates (or one is meant to replace the other), or are indeed
intended to be separate assertions.

If you merged only unique s,p,o combination, you could not handle
deletes or updates.  But without using s,p,o as composite key, you need
some other way to identify a triple -- a "context".  Each store could
presumably store a URI identifying the source context for each triple,
but the context identifier would have to be able to flow through all
stores (it couldn't be store-specific scheme).  And the manner in which
you treat context URI would have to be consistent across all stores.
For example, if you have one context URI for a single document
containing a hundred triples, what happens when you update a single
triple?  You need a way to identify that that single triple should be
deleted from the original context and added to a different one.  Even in
the simple case (a single change results in the old context being
deleted entirely and replaced with new context) you need a way to
communicate deletion from one store to another.  So I am having a hard
time envisioning true model merging without some sort of delta encoding
syntax that is standardized.

Received on Sunday, 6 February 2005 19:50:06 UTC