Re: Sync'ing triplestores from Giovanni Tummarello on 2005-02-07 (semantic-web@w3.org from February 2005)

From: Giovanni Tummarello <giovanni@wup.it>
Date: Mon, 07 Feb 2005 17:47:12 +0100
To: Joshua Allen <joshuaa@microsoft.com>, semantic-web@w3.org
Message-ID: <42079B90.4010305@wup.it>
If you're interested into this specific problem , here is how we do it 
in RDFGrowth, with no intention of saying it is the best or even the 
right way of doing it :-)

a) updates are monotonic additions only.
b) statements are "grouped" according to their blank node closures (MSG) 
and signed by using a reification on a single statement composing the 
closure (its more complicated than this but take this as an explanation)
c) the digital signature hash is a IFP to the MSG
d) updates are managed by distributing a a new MSG that carries the 
statement "replace" and the indication of the hash of the old MSG
client decide if to accept the substitution or not according to the 
digital signature on the replace MSG: likely they will replace it if the 
signature is the same or has a higher hierarchical value.
at this point
d1) in a pure,strictly monotonic  P2P environment like the current 
RDFGrowth keep the original message as well as the update one..
d2) in a centralized system safely delete the old version

Sound complicated? you bet.. but all in all fairly solid, nicely 
monotonic so just keep the spammers out (provide a list of accepted 
signatures a priory or some kind of authority about who can speak)
90% implemented, look for the announcement sometimes rather soon. (But 
if your boss is really interested maybe we can speed things up a bit  8-)  )

Giovanni


Joshua Allen wrote:

>>>I've not had a proper search yet, but was wondering if anyone had
>>>      
>>>
>any
>  
>
>>>pointers to approaches/algorithms for keeping separate triplestores
>>>      
>>>
>in
>  
>
>>>sync. Ideally between different implementations - e.g. Jena +
>>>      
>>>
>Redland.
>  
>
>>Sorry, that wasn't very clear - by sync I mean having (potentially
>>big) models replicated at remote locations.
>>    
>>
>
>I haven't found any good comprehensive prior art, but I have been
>thinking about this a lot lately.  The general problem is merging models
>(since if the models are disjoint, you don't have an issue).  And to
>merge triples, you have to be able to tell whether two triples are
>duplicates (or one is meant to replace the other), or are indeed
>intended to be separate assertions.
>
>If you merged only unique s,p,o combination, you could not handle
>deletes or updates.  But without using s,p,o as composite key, you need
>some other way to identify a triple -- a "context".  Each store could
>presumably store a URI identifying the source context for each triple,
>but the context identifier would have to be able to flow through all
>stores (it couldn't be store-specific scheme).  And the manner in which
>you treat context URI would have to be consistent across all stores.
>For example, if you have one context URI for a single document
>containing a hundred triples, what happens when you update a single
>triple?  You need a way to identify that that single triple should be
>deleted from the original context and added to a different one.  Even in
>the simple case (a single change results in the old context being
>deleted entirely and replaced with new context) you need a way to
>communicate deletion from one store to another.  So I am having a hard
>time envisioning true model merging without some sort of delta encoding
>syntax that is standardized.
>
>  
>
Received on Monday, 7 February 2005 16:47:38 UTC