- From: Bill de hÓra <bill.dehora@propylon.com>
- Date: Sun, 03 Apr 2005 14:23:06 +0100
- To: semantic-web@w3.org
- CC: Danny Ayers <danny.ayers@gmail.com>, joshuaa@microsoft.com
Joshua Allen: "So the real problem of merging RDF stores is in being able to uniquely identify chunks of RDF independent of their full content. It seems the options here are very limited, without going into some crazy "merge definition language" in the RDFS or OWL. Even if you have a simple RDFS/OWL property which tells you a combination of child tuples which uniquely identify the graph, you still have the problem that an update replaces the entire graph; when you probably want it to merge only properties that have been changed (if I update only the e-mail address, and send a graph that has the old postal address, I do not want my update to replace the current postal address). So to accomplish this, you need a delta encoding syntax with change tracking (send a statement; "update the following triple on the node identified by this key, and ignore everything else under that node"). Basically a DML for RDF. To expect all stores to support change tracking and a standardized DML is pretty crazy. We don't even do that in SQL land." I've come to this very late - Danny mentioned syncing triple stores to me recently as an aside to something else. The problem with syncing graphs seems to be, that to do it properly, you need to compute the respective graph complements, which could be a very expensive operation. So, I'll make an assumption; that striving for exact syncing of triplestores is one of those Internet type fallacies (ie along the lines of the 'network is reliable', or 'long-lived transactions'), but at the level of data rather than networking. We have a few such fallacies already for RDF. I would then lower my expectations to a best effort at sharing new interesting data between agents. The simplest way to do this seems to be for stores to expose a triples feed. That is, a store would publish all new deletes, updates and inserts as a data stream. That way, any other store's agent can subscribe to the feed. Writing an RDF/XML content model to describe whether the change is an update, delete or insert should be straightforward, modulo that it would be a statement about a statement. But because the usage and intent is so specific, it would not be a problem to license an application to 'lift' the target statement to something asserted. For example, I'm pretty sure I can alter an RDF event model I have for just that purpose (Danny, you've seen that event model before). Once the change data is described (RDF/XML) and packaged (RSS1.0/Atom), you can think about the delivery protocol. HTTP and XMPP+PubSub come to mind. The upside of this, aside from looking like a tractable problem, is that subscribers can choose what to update and what not and also that conflict resolution is kept local to the stores (again, I would class interoperable resolution protocols as non-workable on the Internet level right now and maybe for ever). It might lack the precision those coming from the enterprise database background would expect or insist upon, but there is a history of failure in regard to getting enterprise approaches to work on the Internet. cheers Bill
Received on Sunday, 3 April 2005 13:23:11 UTC