W3C home > Mailing lists > Public > semantic-web@w3.org > February 2005

RE: Sync'ing triplestores

From: Joshua Allen <joshuaa@microsoft.com>
Date: Tue, 8 Feb 2005 10:16:13 -0800
Message-ID: <0E36FD96D96FCA4AA8E8F2D199320E5204352E59@RED-MSG-43.redmond.corp.microsoft.com>
To: "Giovanni Tummarello" <giovanni@wup.it>, <semantic-web@w3.org>

Yeah, I understand.  I don't this design is really aimed at "merge"
scenarios, then.  It's more aimed at permitting people to share a model.

Think of a scenario where two different parties are working on separate
metadata servers, and want to merge one another's changes -- when
changes touch disjoint parts of the model, they merge seamlessly (but in
any case, changes could both take place below a bnode, so the identity
hash doesn't help).

> -----Original Message-----
> From: Giovanni Tummarello [mailto:giovanni@wup.it]
> Sent: Tuesday, February 08, 2005 3:09 AM
> To: Joshua Allen; semantic-web@w3.org
> Subject: Re: Sync'ing triplestores
> 
> If i understand correctly your question, yes it does, there is no
"blind
> imposal" of triples in this sense.
> but anyway there is no way to replaces/delete "blundles" of triples
> containing  blank nodes if you cant somehow "identify" them with a IFP
> (in case of a blank node) or a signature hash as we do in case of
blank
> node closures (there might be more than one linked, and other cases).
> One of the nice things of the model we propose is that we're using
> standard RDF constructs (e.g. reifications) rather than relaying on
> third party proposed  additions to RDF semantics like named graphs or
> quadruples.
> 
> 
> >So the caller always has to know the identity of the triples bundle
to
> >request replacement of it?
> >
> >
> >
> >>-----Original Message-----
> >>From: Giovanni Tummarello [mailto:giovanni@wup.it]
> >>Sent: Monday, February 07, 2005 8:47 AM
> >>To: Joshua Allen; semantic-web@w3.org
> >>Subject: Re: Sync'ing triplestores
> >>
> >>If you're interested into this specific problem , here is how we do
it
> >>in RDFGrowth, with no intention of saying it is the best or even the
> >>right way of doing it :-)
> >>
> >>a) updates are monotonic additions only.
> >>b) statements are "grouped" according to their blank node closures
> >>
> >>
> >(MSG)
> >
> >
> >>and signed by using a reification on a single statement composing
the
> >>closure (its more complicated than this but take this as an
> >>
> >>
> >explanation)
> >
> >
> >>c) the digital signature hash is a IFP to the MSG
> >>d) updates are managed by distributing a a new MSG that carries the
> >>statement "replace" and the indication of the hash of the old MSG
> >>client decide if to accept the substitution or not according to the
> >>digital signature on the replace MSG: likely they will replace it if
> >>
> >>
> >the
> >
> >
> >>signature is the same or has a higher hierarchical value.
> >>at this point
> >>d1) in a pure,strictly monotonic  P2P environment like the current
> >>RDFGrowth keep the original message as well as the update one..
> >>d2) in a centralized system safely delete the old version
> >>
> >>Sound complicated? you bet.. but all in all fairly solid, nicely
> >>monotonic so just keep the spammers out (provide a list of accepted
> >>signatures a priory or some kind of authority about who can speak)
> >>90% implemented, look for the announcement sometimes rather soon.
(But
> >>if your boss is really interested maybe we can speed things up a bit
> >>
> >>
> >8-)
> >
> >
> >>)
> >>
> >>Giovanni
> >>
> >>
> >>Joshua Allen wrote:
> >>
> >>
> >>
> >>>>>I've not had a proper search yet, but was wondering if anyone had
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>any
> >>>
> >>>
> >>>
> >>>
> >>>>>pointers to approaches/algorithms for keeping separate
triplestores
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>in
> >>>
> >>>
> >>>
> >>>
> >>>>>sync. Ideally between different implementations - e.g. Jena +
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>Redland.
> >>>
> >>>
> >>>
> >>>
> >>>>Sorry, that wasn't very clear - by sync I mean having (potentially
> >>>>big) models replicated at remote locations.
> >>>>
> >>>>
> >>>>
> >>>>
> >>>I haven't found any good comprehensive prior art, but I have been
> >>>thinking about this a lot lately.  The general problem is merging
> >>>
> >>>
> >models
> >
> >
> >>>(since if the models are disjoint, you don't have an issue).  And
to
> >>>merge triples, you have to be able to tell whether two triples are
> >>>duplicates (or one is meant to replace the other), or are indeed
> >>>intended to be separate assertions.
> >>>
> >>>If you merged only unique s,p,o combination, you could not handle
> >>>deletes or updates.  But without using s,p,o as composite key, you
> >>>
> >>>
> >need
> >
> >
> >>>some other way to identify a triple -- a "context".  Each store
could
> >>>presumably store a URI identifying the source context for each
> >>>
> >>>
> >triple,
> >
> >
> >>>but the context identifier would have to be able to flow through
all
> >>>stores (it couldn't be store-specific scheme).  And the manner in
> >>>
> >>>
> >which
> >
> >
> >>>you treat context URI would have to be consistent across all
stores.
> >>>For example, if you have one context URI for a single document
> >>>containing a hundred triples, what happens when you update a single
> >>>triple?  You need a way to identify that that single triple should
be
> >>>deleted from the original context and added to a different one.
Even
> >>>
> >>>
> >in
> >
> >
> >>>the simple case (a single change results in the old context being
> >>>deleted entirely and replaced with new context) you need a way to
> >>>communicate deletion from one store to another.  So I am having a
> >>>
> >>>
> >hard
> >
> >
> >>>time envisioning true model merging without some sort of delta
> >>>
> >>>
> >encoding
> >
> >
> >>>syntax that is standardized.
> >>>
> >>>
> >>>
> >>>
> >>>
> >
> >
> >
Received on Tuesday, 8 February 2005 18:16:35 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 1 March 2016 07:41:44 UTC