W3C home > Mailing lists > Public > semantic-web@w3.org > February 2005

RE: Sync'ing triplestores

From: Joshua Allen <joshuaa@microsoft.com>
Date: Mon, 7 Feb 2005 11:56:00 -0800
Message-ID: <0E36FD96D96FCA4AA8E8F2D199320E5204352608@RED-MSG-43.redmond.corp.microsoft.com>
To: "Giovanni Tummarello" <giovanni@wup.it>, <semantic-web@w3.org>

So the caller always has to know the identity of the triples bundle to
request replacement of it?

> -----Original Message-----
> From: Giovanni Tummarello [mailto:giovanni@wup.it]
> Sent: Monday, February 07, 2005 8:47 AM
> To: Joshua Allen; semantic-web@w3.org
> Subject: Re: Sync'ing triplestores
> If you're interested into this specific problem , here is how we do it
> in RDFGrowth, with no intention of saying it is the best or even the
> right way of doing it :-)
> a) updates are monotonic additions only.
> b) statements are "grouped" according to their blank node closures
> and signed by using a reification on a single statement composing the
> closure (its more complicated than this but take this as an
> c) the digital signature hash is a IFP to the MSG
> d) updates are managed by distributing a a new MSG that carries the
> statement "replace" and the indication of the hash of the old MSG
> client decide if to accept the substitution or not according to the
> digital signature on the replace MSG: likely they will replace it if
> signature is the same or has a higher hierarchical value.
> at this point
> d1) in a pure,strictly monotonic  P2P environment like the current
> RDFGrowth keep the original message as well as the update one..
> d2) in a centralized system safely delete the old version
> Sound complicated? you bet.. but all in all fairly solid, nicely
> monotonic so just keep the spammers out (provide a list of accepted
> signatures a priory or some kind of authority about who can speak)
> 90% implemented, look for the announcement sometimes rather soon. (But
> if your boss is really interested maybe we can speed things up a bit
> )
> Giovanni
> Joshua Allen wrote:
> >>>I've not had a proper search yet, but was wondering if anyone had
> >>>
> >>>
> >any
> >
> >
> >>>pointers to approaches/algorithms for keeping separate triplestores
> >>>
> >>>
> >in
> >
> >
> >>>sync. Ideally between different implementations - e.g. Jena +
> >>>
> >>>
> >Redland.
> >
> >
> >>Sorry, that wasn't very clear - by sync I mean having (potentially
> >>big) models replicated at remote locations.
> >>
> >>
> >
> >I haven't found any good comprehensive prior art, but I have been
> >thinking about this a lot lately.  The general problem is merging
> >(since if the models are disjoint, you don't have an issue).  And to
> >merge triples, you have to be able to tell whether two triples are
> >duplicates (or one is meant to replace the other), or are indeed
> >intended to be separate assertions.
> >
> >If you merged only unique s,p,o combination, you could not handle
> >deletes or updates.  But without using s,p,o as composite key, you
> >some other way to identify a triple -- a "context".  Each store could
> >presumably store a URI identifying the source context for each
> >but the context identifier would have to be able to flow through all
> >stores (it couldn't be store-specific scheme).  And the manner in
> >you treat context URI would have to be consistent across all stores.
> >For example, if you have one context URI for a single document
> >containing a hundred triples, what happens when you update a single
> >triple?  You need a way to identify that that single triple should be
> >deleted from the original context and added to a different one.  Even
> >the simple case (a single change results in the old context being
> >deleted entirely and replaced with new context) you need a way to
> >communicate deletion from one store to another.  So I am having a
> >time envisioning true model merging without some sort of delta
> >syntax that is standardized.
> >
> >
> >
Received on Monday, 7 February 2005 19:56:34 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 19:47:00 UTC