- From: Geoff Chappell <geoff@sover.net>
- Date: Tue, 8 Feb 2005 15:17:34 -0500
- To: "'Joshua Allen'" <joshuaa@microsoft.com>, "'Danny Ayers'" <danny.ayers@gmail.com>, <semantic-web@w3.org>
If you're maintaining the context of triples, it seems you have as much of an aggregation and consistency problem as you do a merge problem. I.e. you would surely have to merge the triples from a changed context with the triples of its older self, but then you'd have to be concerned with the consistency of the aggregation of all contexts. If the introduction of a newly merged context makes things go nuts/inconsistent, you'd want to quarantine that context or manually resolve the inconsistency, or.... So you might have a scenario like: - each source publishes URIs of available contexts (corresponding to CBD, docs, or even triples - doesn't matter as long as it is consistent for that source over time) and some way of returning the triples for that context (context could just be a url from which you can retrieve the triples for that context) - each source has some way of indicating changed contexts (pushed via rss, pulled via sparql query, etc). - a consumer of the source would get a new copy of any changed contexts and merge with its older version (doing a lean merge to avoid bnode buildup); if contexts correspond to small enough chunks, it'd be easier to just skip a delta encoding. - if the database is inconsistent after the context has been updated, pull that context from the mix (until inconsistency can be resolved) - presumably a consumer would have an editable local source as well (which would also likely be the thing that was syndicated to other consumers rather than the aggregation) Geoff > -----Original Message----- > From: semantic-web-request@w3.org [mailto:semantic-web-request@w3.org] On > Behalf Of Joshua Allen > Sent: Sunday, February 06, 2005 2:50 PM > To: Danny Ayers; semantic-web@w3.org > Subject: RE: Sync'ing triplestores > > > > > I've not had a proper search yet, but was wondering if anyone had > any > > > pointers to approaches/algorithms for keeping separate triplestores > in > > > sync. Ideally between different implementations - e.g. Jena + > Redland. > > > > Sorry, that wasn't very clear - by sync I mean having (potentially > > big) models replicated at remote locations. > > I haven't found any good comprehensive prior art, but I have been > thinking about this a lot lately. The general problem is merging models > (since if the models are disjoint, you don't have an issue). And to > merge triples, you have to be able to tell whether two triples are > duplicates (or one is meant to replace the other), or are indeed > intended to be separate assertions. > > If you merged only unique s,p,o combination, you could not handle > deletes or updates. But without using s,p,o as composite key, you need > some other way to identify a triple -- a "context". Each store could > presumably store a URI identifying the source context for each triple, > but the context identifier would have to be able to flow through all > stores (it couldn't be store-specific scheme). And the manner in which > you treat context URI would have to be consistent across all stores. > For example, if you have one context URI for a single document > containing a hundred triples, what happens when you update a single > triple? You need a way to identify that that single triple should be > deleted from the original context and added to a different one. Even in > the simple case (a single change results in the old context being > deleted entirely and replaced with new context) you need a way to > communicate deletion from one store to another. So I am having a hard > time envisioning true model merging without some sort of delta encoding > syntax that is standardized.
Received on Tuesday, 8 February 2005 20:17:57 UTC