Re: Is there a systematic method for naming bnodes? from Sampo Syreeni on 2010-10-03 (semantic-web@w3.org from October 2010)

From: Sampo Syreeni <decoy@iki.fi>
Date: Mon, 4 Oct 2010 02:03:45 +0300 (EEST)
To: Pat Hayes <phayes@ihmc.us>
cc: Melvin Carvalho <melvincarvalho@gmail.com>, Semantic Web <semantic-web@w3.org>
Message-ID: <Pine.LNX.4.64.1010040130050.31215@lakka.kapsi.fi>
On 2010-10-03, Pat Hayes wrote:

> Im not sure what pattern you mean, but certainly bnodes aren't names 
> or identifiers. In fact, they are in effect local variables, bound to 
> the graph in which they occur.

A relational kind of guy like me would say they are surrogate keys. 
Simply opaque things you can hang stuff from, without really knowing 
what they are supposed to correspond to in the real world.

> As the RDF semantics points out, they are exactly like existential 
> variables. Think of each bnode as being a word like 'someone' or 
> 'something', to see how different they are from true names.

Yes. "Stuff." You can infer with "stuff" and about "stuff", but unless 
you do some fancy inferencing around the network the "stuff" is embedded 
in, you won't be able to disambiguate it well enough to handle it as a 
single entity with real world meaning. Then you have to go with its 
formal semantics alone. In particular you cannot assume that this stuff 
might in fact be the same stuff that has a real, canonical name 
elsewhere.

>> I was wondering if there was a systematic method for naming bnodes
>> (e.g. take a hash of the content)
>
> Well, the very idea of a *blank* node is one that has no name, so this 
> idea seems to be rather against the spirit of the bnode, so to speak.

To make it even more poignant, there could be two different pieces of 
"stuff" which are separate, despite being completely inseparable using 
any logical means. This might be a neat discussion piece for the list, 
because in general relational folks forbid this sort of thing even when 
they allow surrogates. At least in theory it should be possible to 
discern any named object from any other based on its properties alone -- 
and a surrogate by definition cannot be one of the user level visible 
ones. It only connects to some more properties, without having a public 
identity of its own.

The trouble at the most general is that this would call for full-fledged 
graph isomorphism testing as a continuously maintained invariant of the 
data model.

> Of course, concrete syntaxes do use bnode identifiers, but these are 
> really just an artifact of the need to represent a graph in a linear 
> character sequence. These bnode identifiers are purely local to the 
> graph.

But the mechanism can still introduce logically distinct entities which 
might or might not separate in the real world, globally. What should we 
do if there is no means of separating two such entities even via global 
graph isomorphism? Are they the same or different? If different, how do 
you know which one to update, based on real world knowledge, if only one 
of them was updated IRL? If the same, how do you conquer the 
computational problem of full, global graph isomorphism testing needing 
to be tested upon every update to find out which bnodes to conflate?

To give you an example, how about Person-type bnodes, which have exactly 
the same names, relatives, cars, addresses and birthdates as me, yet 
whose cars happen to have different colors from mine? In the simplest 
counter-example of this sort, all of the subjects would be bnodes, not 
one of which has an unambiguously distinguishing attribute by itself 
within its type. In general only the network connectivity of a node 
distinguishes it from any other. In this case, are two bnodes with my 
name and other directly shared non-bnode attributes the same or 
different, if the transitive attributes differ? How about when the 
transitive closure from me agrees with a traversal starting from some 
other node in all non-blank nodes, globally?

In the relational framework the answer is, ideally graph isomorphism 
should dictate full equivalence. I dunno whether this has ever been 
spelled out loud anywhere, but that's the way theoreticians seem to 
speak about the RM with (opaque) surrogates. In the context of RDF, I've 
never seen this discussed or even addressed between the lines.
-- 
Sampo Syreeni, aka decoy - decoy@iki.fi, http://decoy.iki.fi/front
+358-50-5756111, 025E D175 ABE5 027C 9494 EEB0 E090 8BA9 0509 85C2
Received on Sunday, 3 October 2010 23:04:57 UTC