RE: Sync'ing triplestores

If you're maintaining the context of triples, it seems you have as much of
an aggregation and consistency problem as you do a merge problem. I.e. you
would surely have to merge the triples from a changed context with the
triples of its older self, but then you'd have to be concerned with the
consistency of the aggregation of all contexts. If the introduction of a
newly merged context makes things go nuts/inconsistent, you'd want to
quarantine that context or manually resolve the inconsistency, or....

So you might have a scenario like:

- each source publishes URIs of available contexts (corresponding to CBD,
docs, or even triples - doesn't matter as long as it is consistent for that
source over time) and some way of returning the triples for that context
(context could just be a url from which you can retrieve the triples for
that context)
- each source has some way of indicating changed contexts (pushed via rss,
pulled via sparql query, etc).
- a consumer of the source would get a new copy of any changed contexts and
merge with its older version (doing a lean merge to avoid bnode buildup); if
contexts correspond to small enough chunks, it'd be easier to just skip a
delta encoding.
- if the database is inconsistent after the context has been updated, pull
that context from the mix (until inconsistency can be resolved)
- presumably a consumer would have an editable local source as well (which
would also likely be the thing that was syndicated to other consumers rather
than the aggregation)

Geoff

> -----Original Message-----
> From: semantic-web-request@w3.org [mailto:semantic-web-request@w3.org] On
> Behalf Of Joshua Allen
> Sent: Sunday, February 06, 2005 2:50 PM
> To: Danny Ayers; semantic-web@w3.org
> Subject: RE: Sync'ing triplestores
> 
> 
> > > I've not had a proper search yet, but was wondering if anyone had
> any
> > > pointers to approaches/algorithms for keeping separate triplestores
> in
> > > sync. Ideally between different implementations - e.g. Jena +
> Redland.
> >
> > Sorry, that wasn't very clear - by sync I mean having (potentially
> > big) models replicated at remote locations.
> 
> I haven't found any good comprehensive prior art, but I have been
> thinking about this a lot lately.  The general problem is merging models
> (since if the models are disjoint, you don't have an issue).  And to
> merge triples, you have to be able to tell whether two triples are
> duplicates (or one is meant to replace the other), or are indeed
> intended to be separate assertions.
> 
> If you merged only unique s,p,o combination, you could not handle
> deletes or updates.  But without using s,p,o as composite key, you need
> some other way to identify a triple -- a "context".  Each store could
> presumably store a URI identifying the source context for each triple,
> but the context identifier would have to be able to flow through all
> stores (it couldn't be store-specific scheme).  And the manner in which
> you treat context URI would have to be consistent across all stores.
> For example, if you have one context URI for a single document
> containing a hundred triples, what happens when you update a single
> triple?  You need a way to identify that that single triple should be
> deleted from the original context and added to a different one.  Even in
> the simple case (a single change results in the old context being
> deleted entirely and replaced with new context) you need a way to
> communicate deletion from one store to another.  So I am having a hard
> time envisioning true model merging without some sort of delta encoding
> syntax that is standardized.

Received on Tuesday, 8 February 2005 20:17:57 UTC