Re: implementations of networked RDF store replication, please? from Steve Harris on 2011-03-02 (semantic-web@w3.org from March 2011)

From: Steve Harris <steve.harris@garlik.com>
Date: Wed, 2 Mar 2011 12:18:00 +0000
To: Ivan Shmakov <oneingray@gmail.com>
Cc: semantic-web@w3.org
Message-Id: <F5F0BEDD-D44B-4149-997A-27AC90D8ECC9@garlik.com>

On 2011-03-01, at 09:42, Ivan Shmakov wrote:

>>>>>> Steve Harris <steve.harris@garlik.com> writes:
> 
>>> I wonder, are there any software packages to facilitate RDF
>>> store replication over the network?
> 
>> I would expect that it can be done with a correctly configured HTTP
>> reverse proxy. In SPARQL 1.1 you can write triples using HTTP POST,
>> so proxying across two SPARQL stores should work. You could even use
>> two different SPARQL systems.
> 
> 	However, this seem to rely on the availability of all the nodes
> 	during updates.  Which is contrary to one of the reasons to do
> 	replication in the first place: to allow for some nodes to
> 	become unavailable from time to time.
> 
>> If it's for the purposes of replication then you can configure 4store
>> to do this, and probably most other clustered RDF stores.
> 
>> Just create a cluster of two nodes, and enable replication. Your
>> write will end up on both nodes. Then you will be using whatever
>> internal protocol the cluster uses though, rather than HTTP. It
>> should be more efficient, but it's not portable between RDF stores.
> 
> 	ACK.  Thanks.
> 
> 	Furthermore, I see that both of the solutions impose the “single
> 	administrative domain” limit — they don't seem to provide a way
> 	for “stranger nodes” to walk in from time to time, subscribing
> 	and unsubscribing to the change notifications as they wish.

4store allows that, at least in theory. Its discovery is based on mDNS - I don't know how often/how the HTTP server responds to new replicas appearing though. Depending on the volume of data the catchup time can be considerable, and self-consistent, but globally inconsistent replicas aren't that useful for many purposes, so we wouldn't recommend running it that way.

Modifying the storage code to allow global inconsistency would be trivial, if you have a use case where that's advisable. I guess we'd accept a patch that made it a compile time option.

- Steve

-- 
Steve Harris, CTO, Garlik Limited
1-3 Halford Road, Richmond, TW10 6AW, UK
+44 20 8439 8203  http://www.garlik.com/
Registered in England and Wales 535 7233 VAT # 849 0517 11
Registered office: Thames House, Portsmouth Road, Esher, Surrey, KT10 9AD

Received on Wednesday, 2 March 2011 12:18:35 UTC