RE: Sync'ing triplestores from Bill de hÓra on 2005-04-03 (semantic-web@w3.org from April 2005)

From: Bill de hÓra <bill.dehora@propylon.com>
Date: Sun, 03 Apr 2005 14:23:06 +0100
To: semantic-web@w3.org
CC: Danny Ayers <danny.ayers@gmail.com>, joshuaa@microsoft.com
Message-ID: <424FEE3A.5040100@propylon.com>

Joshua Allen:
"So the real problem of merging RDF stores is in being able to uniquely
identify chunks of RDF independent of their full content.  It seems the
options here are very limited, without going into some crazy "merge
definition language" in the RDFS or OWL.  Even if you have a simple
RDFS/OWL property which tells you a combination of child tuples which
uniquely identify the graph, you still have the problem that an update
replaces the entire graph; when you probably want it to merge only
properties that have been changed (if I update only the e-mail address,
and send a graph that has the old postal address, I do not want my
update to replace the current postal address).  So to accomplish this,
you need a delta encoding syntax with change tracking (send a statement;
"update the following triple on the node identified by this key, and
ignore everything else under that node").  Basically a DML for RDF.  To
expect all stores to support change tracking and a standardized DML is
pretty crazy.  We don't even do that in SQL land."

I've come to this very late - Danny mentioned syncing triple stores to 
me recently as an aside to something else.

The problem with syncing graphs seems to be, that to do it properly, you 
need to compute the respective graph complements, which could be a very 
expensive operation.

So, I'll make an assumption; that striving for exact syncing of 
triplestores is one of those Internet type fallacies (ie along the lines 
of the 'network is reliable', or 'long-lived transactions'), but at the 
level of data rather than networking. We have a few such fallacies 
already for RDF.

I would then lower my expectations to a best effort at sharing new 
interesting data between agents. The simplest way to do this seems to be 
for stores to expose a triples feed. That is, a store would publish all 
new deletes, updates and inserts as a data stream. That way, any other 
store's agent can subscribe to the feed.

Writing an RDF/XML content model to describe whether the change is an 
update, delete or insert should be straightforward, modulo that it would 
be a statement about a statement. But because the usage and intent is so 
specific, it would not be a problem to license an application to 'lift' 
the target statement to something asserted. For example, I'm pretty sure 
I can alter an RDF event model I have for just that purpose (Danny, 
you've seen that event model before).  Once the change data is described 
(RDF/XML) and packaged (RSS1.0/Atom), you can think about the delivery 
protocol. HTTP and XMPP+PubSub come to mind.

The upside of this, aside from looking like a tractable problem, is that 
subscribers can choose what to update and what not and also that 
conflict resolution is kept local to the stores (again, I would class 
interoperable resolution protocols as non-workable on the Internet level 
right now and maybe for ever). It might lack the precision those coming 
from the enterprise database background would expect or insist upon, but 
there is a history of failure in regard to getting enterprise approaches 
to work on the Internet.

cheers
Bill

Received on Sunday, 3 April 2005 13:23:11 UTC