RE: Sync'ing triplestores from Geoff Chappell on 2005-04-05 (semantic-web@w3.org from April 2005)

From: Geoff Chappell <geoff@sover.net>
Date: Tue, 5 Apr 2005 09:18:40 -0400
To: "'Joshua Allen'" <joshuaa@microsoft.com>, "'Bill de hÓra'" <bill.dehora@propylon.com>, <semantic-web@w3.org>
Cc: "'Danny Ayers'" <danny.ayers@gmail.com>
Message-ID: <007201c539e1$fd081f80$6401a8c0@gsclaptop>

> -----Original Message-----
> From: Joshua Allen [mailto:joshuaa@microsoft.com]
> Sent: Monday, April 04, 2005 3:14 PM
> To: Geoff Chappell; Bill de hÓra; semantic-web@w3.org
> Cc: Danny Ayers
> Subject: RE: Sync'ing triplestores
> 
> Yeah, I was thinking just don't support bnodes -- replace them with some
> urn:bnode:random-guid syntax

Works fine within the walls of a closed system - e.g. bnodes are implemented
that way within rdf gateway, but are converted from guids to true bnodes
(and vice versa) at the edges. 

I wonder what interop problems you introduce if you force each system to
maintain global names for bnodes outside of their walls. For example a
client dumps some rdf into a replicated triplestore:

	[a rdfs:Class; rdfs:label "MyClass"].

then reads it back and gets:

	<guid:_123456789> a rdfs:Class; rdfs:label "MyClass".

Isn’t that problem? I suppose you could say that the system would be
selective about when it gives the bnode's global vs. local name - e.g. using
the local name for all client access except replication clients. But that's
essentially requiring that all systems that want to participate in
replication must modify their inner workings and handling of bnodes -- talk
about being DOA in terms of deployment.

I suspect it's better to just bite the bullet and deal with identification
of bnodes by description. There is always a description in a particular
graph that is sufficient to identify a bnode in that graph (if there are
multiple subgraphs that match the description then they're redundant info
anway and can be merged). Maybe an approach like this would work:

For each bnode in a delta statement:

- get the anonymously connected subgraph that contains that bnode (i.e. the
node and all nodes it is connected to plus recursively any of those nodes
that are also bnodes)
- compute a hash of some canonical representation of that subgraph

Then use the hash as the id of the bnode.

Alternatively, you could pass the whole subgraph as some sort of selector
(e.g. a sparql query). Could get a little expensive for massively connected
bnode graphs - but they suck for a lot of other reasons as well.

Another approach would be to just deal with updates on a cbd basis - i.e.
always fully update the full description of a named object.

Geoff

Received on Tuesday, 5 April 2005 13:19:06 UTC