Re: Sync'ing triplestores from Danny Ayers on 2005-02-08 (semantic-web@w3.org from February 2005)

From: Danny Ayers <danny.ayers@gmail.com>
Date: Tue, 8 Feb 2005 20:05:44 +0100
To: Joshua Allen <joshuaa@microsoft.com>
Cc: Giovanni Tummarello <giovanni@wup.it>, semantic-web@w3.org
Message-ID: <1f2ed5cd0502081105567a4de9@mail.gmail.com>
Thanks Giovanni, Joshua - I'm reading with interest ;-)

btw, one thing I had considered was cheating and using RDBMS-backed
(same toolkit) stores, and keeping them synchronized underneath. It
seems MySQL 5+ will support the kind of multi-master replication I'm
looking for, but right now there's only seems to be
master-(readonly)slave. I suspect a fairly naive algorithm on top of
SQL might give acceptable performance, but I doubt whether it would be
much more effort to do something on top of the RDF layer. Robert
Turner suggested using SPARQL, might be a promising angle for a
general solution. First on my list though is poking around with
RDFGrowth ;-)

On Tue, 8 Feb 2005 10:16:13 -0800, Joshua Allen <joshuaa@microsoft.com> wrote:
> 
> Yeah, I understand.  I don't this design is really aimed at "merge"
> scenarios, then.  It's more aimed at permitting people to share a model.
> 
> Think of a scenario where two different parties are working on separate
> metadata servers, and want to merge one another's changes -- when
> changes touch disjoint parts of the model, they merge seamlessly (but in
> any case, changes could both take place below a bnode, so the identity
> hash doesn't help).
> 
> > -----Original Message-----
> > From: Giovanni Tummarello [mailto:giovanni@wup.it]
> > Sent: Tuesday, February 08, 2005 3:09 AM
> > To: Joshua Allen; semantic-web@w3.org
> > Subject: Re: Sync'ing triplestores
> >
> > If i understand correctly your question, yes it does, there is no
> "blind
> > imposal" of triples in this sense.
> > but anyway there is no way to replaces/delete "blundles" of triples
> > containing  blank nodes if you cant somehow "identify" them with a IFP
> > (in case of a blank node) or a signature hash as we do in case of
> blank
> > node closures (there might be more than one linked, and other cases).
> > One of the nice things of the model we propose is that we're using
> > standard RDF constructs (e.g. reifications) rather than relaying on
> > third party proposed  additions to RDF semantics like named graphs or
> > quadruples.
> >
> >
> > >So the caller always has to know the identity of the triples bundle
> to
> > >request replacement of it?
> > >
> > >
> > >
> > >>-----Original Message-----
> > >>From: Giovanni Tummarello [mailto:giovanni@wup.it]
> > >>Sent: Monday, February 07, 2005 8:47 AM
> > >>To: Joshua Allen; semantic-web@w3.org
> > >>Subject: Re: Sync'ing triplestores
> > >>
> > >>If you're interested into this specific problem , here is how we do
> it
> > >>in RDFGrowth, with no intention of saying it is the best or even the
> > >>right way of doing it :-)
> > >>
> > >>a) updates are monotonic additions only.
> > >>b) statements are "grouped" according to their blank node closures
> > >>
> > >>
> > >(MSG)
> > >
> > >
> > >>and signed by using a reification on a single statement composing
> the
> > >>closure (its more complicated than this but take this as an
> > >>
> > >>
> > >explanation)
> > >
> > >
> > >>c) the digital signature hash is a IFP to the MSG
> > >>d) updates are managed by distributing a a new MSG that carries the
> > >>statement "replace" and the indication of the hash of the old MSG
> > >>client decide if to accept the substitution or not according to the
> > >>digital signature on the replace MSG: likely they will replace it if
> > >>
> > >>
> > >the
> > >
> > >
> > >>signature is the same or has a higher hierarchical value.
> > >>at this point
> > >>d1) in a pure,strictly monotonic  P2P environment like the current
> > >>RDFGrowth keep the original message as well as the update one..
> > >>d2) in a centralized system safely delete the old version
> > >>
> > >>Sound complicated? you bet.. but all in all fairly solid, nicely
> > >>monotonic so just keep the spammers out (provide a list of accepted
> > >>signatures a priory or some kind of authority about who can speak)
> > >>90% implemented, look for the announcement sometimes rather soon.
> (But
> > >>if your boss is really interested maybe we can speed things up a bit
> > >>
> > >>
> > >8-)
> > >
> > >
> > >>)
> > >>
> > >>Giovanni
> > >>
> > >>
> > >>Joshua Allen wrote:
> > >>
> > >>
> > >>
> > >>>>>I've not had a proper search yet, but was wondering if anyone had
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>any
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>>>pointers to approaches/algorithms for keeping separate
> triplestores
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>in
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>>>sync. Ideally between different implementations - e.g. Jena +
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>Redland.
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>>Sorry, that wasn't very clear - by sync I mean having (potentially
> > >>>>big) models replicated at remote locations.
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>I haven't found any good comprehensive prior art, but I have been
> > >>>thinking about this a lot lately.  The general problem is merging
> > >>>
> > >>>
> > >models
> > >
> > >
> > >>>(since if the models are disjoint, you don't have an issue).  And
> to
> > >>>merge triples, you have to be able to tell whether two triples are
> > >>>duplicates (or one is meant to replace the other), or are indeed
> > >>>intended to be separate assertions.
> > >>>
> > >>>If you merged only unique s,p,o combination, you could not handle
> > >>>deletes or updates.  But without using s,p,o as composite key, you
> > >>>
> > >>>
> > >need
> > >
> > >
> > >>>some other way to identify a triple -- a "context".  Each store
> could
> > >>>presumably store a URI identifying the source context for each
> > >>>
> > >>>
> > >triple,
> > >
> > >
> > >>>but the context identifier would have to be able to flow through
> all
> > >>>stores (it couldn't be store-specific scheme).  And the manner in
> > >>>
> > >>>
> > >which
> > >
> > >
> > >>>you treat context URI would have to be consistent across all
> stores.
> > >>>For example, if you have one context URI for a single document
> > >>>containing a hundred triples, what happens when you update a single
> > >>>triple?  You need a way to identify that that single triple should
> be
> > >>>deleted from the original context and added to a different one.
> Even
> > >>>
> > >>>
> > >in
> > >
> > >
> > >>>the simple case (a single change results in the old context being
> > >>>deleted entirely and replaced with new context) you need a way to
> > >>>communicate deletion from one store to another.  So I am having a
> > >>>
> > >>>
> > >hard
> > >
> > >
> > >>>time envisioning true model merging without some sort of delta
> > >>>
> > >>>
> > >encoding
> > >
> > >
> > >>>syntax that is standardized.
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>
> > >
> > >
> > >
> 
> 


-- 

http://dannyayers.com
Received on Tuesday, 8 February 2005 19:05:45 UTC