- From: Joshua Allen <joshuaa@microsoft.com>
- Date: Tue, 8 Feb 2005 12:18:39 -0800
- To: "Danny Ayers" <danny.ayers@gmail.com>
- Cc: "Giovanni Tummarello" <giovanni@wup.it>, <semantic-web@w3.org>
> btw, one thing I had considered was cheating and using RDBMS-backed > (same toolkit) stores, and keeping them synchronized underneath. It > seems MySQL 5+ will support the kind of multi-master replication I'm > looking for, but right now there's only seems to be > master-(readonly)slave. I suspect a fairly naive algorithm on top of > SQL might give acceptable performance, but I doubt whether it would be Yes, exactly. Note that people did multi-master replication for a long time before it was supported out of the box, and the code that ships with the DBMS is just repackaging the scripts people used to write. Implementing it is easy; the hard part is deciding what you want the rules for merging to be (define primary keys; multivalue vs. single value; and conflict resolution rules). With RDBMS you can make assumptions based on schema -- if a row with existing PK is added, you replace. If a row under a PK-->FK relationship is added (with new key on foreign side), you add new. And that's about as complicated as it gets (except for conflict resolution). The difference with RDF is that there is no such thing as a primary key for a triple (other than to treat the contents of the whole thing as a key). It's the same problem as doing merge replication on an RDBMS with no primary keys -- you have to use the whole tuple as a key, and then merge replication works, but it is very likely not the results that anyone would find desirable. You just get a mess. So the real problem of merging RDF stores is in being able to uniquely identify chunks of RDF independent of their full content. It seems the options here are very limited, without going into some crazy "merge definition language" in the RDFS or OWL. Even if you have a simple RDFS/OWL property which tells you a combination of child tuples which uniquely identify the graph, you still have the problem that an update replaces the entire graph; when you probably want it to merge only properties that have been changed (if I update only the e-mail address, and send a graph that has the old postal address, I do not want my update to replace the current postal address). So to accomplish this, you need a delta encoding syntax with change tracking (send a statement; "update the following triple on the node identified by this key, and ignore everything else under that node"). Basically a DML for RDF. To expect all stores to support change tracking and a standardized DML is pretty crazy. We don't even do that in SQL land.
Received on Tuesday, 8 February 2005 20:19:12 UTC