Re: shared bnodes (Re: RDF dataset semantics again) from Antoine Zimmermann on 2012-08-24 (public-rdf-wg@w3.org from August 2012)

From: Antoine Zimmermann <antoine.zimmermann@emse.fr>
Date: Fri, 24 Aug 2012 12:19:07 +0200
To: public-rdf-wg@w3.org
Message-ID: <5037551B.6090508@emse.fr>
Le 23/08/2012 15:59, Sandro Hawke a écrit :
> On 08/23/2012 09:08 AM, Antoine Zimmermann wrote:
>>

 > [...]
>> But it's never possible to know that two graphs share the same bnodes,
>> as it is impossible to identify them in a RDF Graph.
>>
>
> That's not true. It's easy when the two graphs are both subgraphs of
> some other graph, right?

Oh, I see, true.


> Given this graph (g0) of two triples:
>
> _:x :p1 1,2.
>
> There is a subgraph g1:
>
> (blank node) :p1 1.
>
> and a subgraph g2:
>
> (blank node) :p1 2.
>
> and those two graphs (g1 and g2) share a bnode.
>
> This situation is easily handled in SPARQL and in various RDF APIs. It
> is required for various use cases, such as
> http://www.w3.org/2011/rdf-wg/wiki/Why_Graphs#Separation_of_Inference

What is easily handled? I just see an RDF graph with 2 triples. Of 
course, it has two singleton subsets, no doubt about that. Then what 
situation do you want to handle?


>>> So, those graphs-in-quotes
>>> would have to be parsed as some new kind of thing -- a
>>> document-fragments, instead of a document. A little problem, IMHO, not a
>>> big one.
>>
>> For me, "bnode sharing" is not exactly about sharing bnodes in RDF
>> Graphs. It is a case when you want to identify two bnodes in two
>> distinct RDF graphs, and consider that they can be unified when
>> combining the two RDF graphs. In fact, it does not much matter whether
>> the two bnodes are different, as long as you can indicate that they
>> can be unified in a union/merge operation.
>>
>> If I have { _:b prop _:c } and you have { :x foo _:b } and we decide
>> that the two (_:b)s are unifiable, then it does not matter that they
>> actually identify the same bnode. As long as you're cool with the fact
>> that you want to unify them.
>
> If you don't have a syntax or protocol (such as an RDF API) for
> constructing graphs with shared bnodes, then, yes, you need to indicate
> that some kind of unification is desired/appropriate.

If two graphs share a bnode, it makes no difference than if they do not. 
Let us use some maths notation to be able to talk about a particular 
bnode. Let b be a bnode. Let p, o1, o2 be three IRIs. Using math 
notation, the following:

{ (b, p, o1) }

is an RDF graph. And the following:

{ (b, p, o2) }

is another RDF graph. They share a bnode, right? The first says there 
exists some kind of entity related to o1 via property p. The second says 
there exists an entity related to o2 via p. Since b does not denote 
anything, there is no reason that the existing entity known from the 
first graph have anything in common with the existing entity known from 
the second graph. The knowledge that the two graphs share a bnode has 
absolutely no value. If you know the first graph, and you know the 
second graph, you do not know the union of them. Stated differently:

{ {(b,p,o1)}, {(b,p,o2)} } does not entail {(b,p,o1), (b,p,o2)}. Cf. the 
RDF Semantics. This is not a bug, it's how existential variables work.

Now, there are cases when you want to unify existential variables. A 
possible solution for this is, indeed, to use set union instead of 
merge, such that equal bnodes are, obviously, unified.
But to be able to use union, you would have to know what exact bnode is 
used, among all the possible bnodes that exist (that is, a global bnode 
identification mechanism). There are no RDF syntax that can do that at 
the moment (skolem IRIs are an intermediary solution, but not quite the 
same).

So, I'm saying that you can just forget about trying to identify equal 
bnodes across graph, and simply rely on a local identifier in one graph, 
a local identifier in another graph, and tell with a dedicated mechanism 
that these two locally identified bnodes are assumed to be the same.

One such mechanism is exactly what you say: we assume that bnodes with 
equal bnode ID are the same. But it's really just a unification process, 
rather than actually sharing the very same bnode.

In any case, it's a dangerous operation if improperly applied, as it 
adds information that is not present in the original graphs.

What would you conclude, seeing these two separate sentences?
{ There is a person. Let's call it X. X is named "Bob".}
{ There is a document. Let's call it X. X has title "The Bible".}


> It seems to me the simple. obvious, and appropriate way to handle this
> for most use cases is to allow blank node labels to be shared between
> different parts of a multi-graph document.
>
> An alternative would be to somehow use our Skolemization URIs, but that
> seems much more awkward.

Yes, but I'm sure many times it's sufficient. The example about 
inferences seems to me a case in favour of skolem IRIs.

>
>> Think of bnodes as black marbles that have the exact same atomic
>> structure. You cannot distinguish them, apart from their position. If
>> you separate out a marble from a graph, and fix another marble there
>> instead, nothing changes. The result is still a black marble at the
>> same position. Anyway, they only indicate the existence of a thing, so
>> one bnode is as good as any other.
>>
>> However, if I play with these marbles, building a very big graph, that
>> I later record as Turtle, then put back the marbles in their pouch.
>> And I'd like the day after to extend that big graph with more triples,
>> I may have to assume that some of the marbles I'm using can be
>> identified to the marbles I used the day before. And still it does not
>> matter that I reuse the exact same marbles.
>>
>
> I think of blank nodes as locations on a surface (thanks to Pat).
> Multiple graphs may be drawn on that surface, using that blank node.
> They can be distinguished from each other by being different colors, or
> written on different transparent layers, or you just remember which arcs
> are part of which graphs.
>
>>
>>> IMHO we should at some point sketch out this solution and its
>>> isomorphism to whatever we settle on. Maybe not actually assign a
>>> vocabulary to it, lest people use it and not be interoperable.
>>> Alternatively, it might be the way RDF/XML folks play in the named-graph
>>> space. (That's a Time Permitting deliverable in our charter.)
>>
>> Ivan drafted something a while ago where he defined a datatype for
>> graphs. However, what I think made the document problematic, is that
>> it was trying to address too many things at the same time. We could
>> take out of Ivan's draft the part on defining the graph datatype,
>> together with portions of the discussions in there, and propose it as
>> a WG Note, maybe.
>>
>
> Perhaps, but let's find our core solution first, then circle back and
> think about how to explain it for various audiences.

Ok.


>
> -- Sandro
>>
>> AZ
>>
>>
>>>
>>> -- Sandro
>>>
>>>>
>>>>
>>>>>
>>>>> -- Sandro
>>>>>
>>>>>>>
>>>>>>> - s
>>>>>>>
>>>>>>>>>
>>>>>>>>> Another (uglier!) representation would be
>>>>>>>>> <g> ex:hasGraph
>>>>>>>>> <data:text/turtle;charset=UTF-8,%3Cs%3E%20%3Cp%3E%20%3Co%3E> .
>>>>>>>>>
>>>>>>>>> Which would also allow you to make statements about the quoted
>>>>>>>>> graph
>>>>>>>>> <data:text/turtle;charset=UTF-8,%3Cs%3E%20%3Cp%3E%20%3Co%3E>
>>>>>>>>> dc:date
>>>>>>>>> "2012-08-22T14:29:23Z"^^xsd:dateTime .
>>>>>>>>
>>>>>>>> +1
>>>>>>>>
>>>>>>>>>
>>>>>>>>> - Steve
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>>
>>
>
>
>

-- 
Antoine Zimmermann
ISCOD / LSTI - Institut Henri Fayol
École Nationale Supérieure des Mines de Saint-Étienne
158 cours Fauriel
42023 Saint-Étienne Cedex 2
France
Tél:+33(0)4 77 42 66 03
Fax:+33(0)4 77 42 66 66
http://zimmer.aprilfoolsreview.com/
Received on Friday, 24 August 2012 10:19:40 UTC