Re: shared bnodes (Re: RDF dataset semantics again) from Steve Harris on 2012-08-28 (public-rdf-wg@w3.org from August 2012)

From: Steve Harris <steve.harris@garlik.com>
Date: Tue, 28 Aug 2012 16:49:05 +0100
To: Sandro Hawke <sandro@w3.org>
Cc: Antoine Zimmermann <antoine.zimmermann@emse.fr>, public-rdf-wg@w3.org
Message-Id: <84036E26-F91F-4D27-A831-D50C98A85C37@garlik.com>
On 2012-08-23, at 14:59, Sandro Hawke wrote:

> On 08/23/2012 09:08 AM, Antoine Zimmermann wrote:
>> Le 23/08/2012 13:11, Sandro Hawke a écrit :
>>> On 08/23/2012 05:57 AM, Antoine Zimmermann wrote:
>>>> 
>>>> 
>>>> Le 22/08/2012 18:37, Sandro Hawke a écrit :
>>>>> On 08/22/2012 12:30 PM, Antoine Zimmermann wrote:
>>>>>> 
>>>>>>> 
>>>>>>> I could live with it if there were a syntactic sugar, probably
>>>>>>> involving
>>>>>>> curly braces. :-)
>>>>>> 
>>>>>> Yes, the syntax is not really practical.
>>>>>> 
>>>>> 
>>>>> Indeed. But, yes, it's a nice way to think about the semantics. I
>>>>> understood it to be a way the WG was not okay with.
>>>> 
>>>> 
>>>> My impression was that the group found the idea reasonable, possibly
>>>> appealing, but due to a total absence of implementation of this
>>>> solution, and no experience with it thereby, it was not a good idea to
>>>> standardise such a thing.
>>>> 
>>>> 
>>> 
>>> I don't recall hearing that, but it makes sense.
>> 
>> It's my own interpretation of the situation at that time, but I don't think we formalised this as a resolution.
>> 
>> 
>>>>> One bit that doesn't quite work is that some of the use cases require
>>>>> blank nodes to be shared between named graphs. That would be rather
>>>>> strange with this literal-strings model.
>>>> 
>>>> 
>>>> It is in principle possible to define the datatype such that the value
>>>> space is not exactly the set of RDF Graphs, but rather "RDF Graphs
>>>> where some bnodes can be labelled". The bnode labels are made disjoint
>>>> from URIs, so they can be distinguished apart from normal names, but
>>>> they would not be purely local to the graph.
>>>> 
>>> 
>>> I'm not sure you'd need to change the value space. Existing (2004)
>>> g-snaps can share bnodes, it's just the way the syntaxes are parsed
>>> doesn't currently allow one to indicate that.
>> 
>> But it's never possible to know that two graphs share the same bnodes, as it is impossible to identify them in a RDF Graph.
>> 
> 
> That's not true.   It's easy when the two graphs are both subgraphs of some other graph, right?
> 
> Given this graph (g0) of two triples:
> 
>   _:x :p1 1,2.
> 
> There is a subgraph g1:
> 
>   (blank node) :p1 1.
> 
> and a subgraph g2:
> 
>   (blank node) :p1 2.
> 
> and those two graphs (g1 and g2) share a bnode.
> 
> This situation is easily handled in SPARQL and in various RDF APIs. It is required for various use cases, such as
> http://www.w3.org/2011/rdf-wg/wiki/Why_Graphs#Separation_of_Inference
> 
>>> So, those graphs-in-quotes
>>> would have to be parsed as some new kind of thing -- a
>>> document-fragments, instead of a document. A little problem, IMHO, not a
>>> big one.
>> 
>> For me, "bnode sharing" is not exactly about sharing bnodes in RDF Graphs. It is a case when you want to identify two bnodes in two distinct RDF graphs, and consider that they can be unified when combining the two RDF graphs. In fact, it does not much matter whether the two bnodes are different, as long as you can indicate that they can be unified in a union/merge operation.
>> 
>> If I have { _:b  prop  _:c } and you have { :x  foo  _:b } and we decide that the two (_:b)s are unifiable, then it does not matter that they actually identify the same bnode. As long as you're cool with the fact that you want to unify them.
> 
> If you don't have a syntax or protocol (such as an RDF API) for constructing graphs with shared bnodes, then, yes, you need to indicate that some kind of unification is desired/appropriate.
> 
> It seems to me the simple. obvious, and appropriate way to handle this for most use cases is to allow blank node labels to be shared between different parts of a multi-graph document.

It's very easy in the case where you want to indicate that the bNodes are shared - but there is some cost to it - when you want to produce the multi-graph document you need to ensure that the labels for distinct bNodes are kept distinct.

Consequently you can't do tricks like:

( for i in *.ttl; do echo "<$i> {" ; cat $i ; echo "}" ; done ) > foo.trig

I've never done anything exactly like that, and I have no feel for how common a usecase it is, but it's worth noting that in RDF-2004 it would be "safe", and in RDF 1.1 it might result in shared bNodes, depending on how lucky you were.

I'm also not sure what existing practice is when reading multi-graph documents with shared bNode labels in different graphs. Certainly 4store creates different bNodes for each <label,graph> combination, as this is what RDF semantics says/said.

> An alternative would be to somehow use our Skolemization URIs, but that seems much more awkward.

Yes. On the upside it would be unambiguous what you intended, but it's a fair bit more awkward.

- Steve

>> Think of bnodes as black marbles that have the exact same atomic structure. You cannot distinguish them, apart from their position. If you separate out a marble from a graph, and fix another marble there instead, nothing changes. The result is still a black marble at the same position. Anyway, they only indicate the existence of a thing, so one bnode is as good as any other.
>> 
>> However, if I play with these marbles, building a very big graph, that I later record as Turtle, then put back the marbles in their pouch. And I'd like the day after to extend that big graph with more triples, I may have to assume that some of the marbles I'm using can be identified to the marbles I used the day before. And still it does not matter that I reuse the exact same marbles.
>> 
> 
> I think of blank nodes as locations on a surface (thanks to Pat). Multiple graphs may be drawn on that surface, using that blank node.   They can be distinguished from each other by being different colors, or written on different transparent layers, or you just remember which arcs are part of which graphs.
> 
>> 
>>> IMHO we should at some point sketch out this solution and its
>>> isomorphism to whatever we settle on. Maybe not actually assign a
>>> vocabulary to it, lest people use it and not be interoperable.
>>> Alternatively, it might be the way RDF/XML folks play in the named-graph
>>> space. (That's a Time Permitting deliverable in our charter.)
>> 
>> Ivan drafted something a while ago where he defined a datatype for graphs. However, what I think made the document problematic, is that it was trying to address too many things at the same time. We could take out of Ivan's draft the part on defining the graph datatype, together with portions of the discussions in there, and propose it as a WG Note, maybe.
>> 
> 
> Perhaps, but let's find our core solution first, then circle back and think about how to explain it for various audiences.
> 
>    -- Sandro
>> 
>> AZ
>> 
>> 
>>> 
>>> -- Sandro
>>> 
>>>> 
>>>> 
>>>>> 
>>>>> -- Sandro
>>>>> 
>>>>>>> 
>>>>>>> - s
>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Another (uglier!) representation would be
>>>>>>>>> <g> ex:hasGraph
>>>>>>>>> <data:text/turtle;charset=UTF-8,%3Cs%3E%20%3Cp%3E%20%3Co%3E> .
>>>>>>>>> 
>>>>>>>>> Which would also allow you to make statements about the quoted graph
>>>>>>>>> <data:text/turtle;charset=UTF-8,%3Cs%3E%20%3Cp%3E%20%3Co%3E> dc:date
>>>>>>>>> "2012-08-22T14:29:23Z"^^xsd:dateTime .
>>>>>>>> 
>>>>>>>> +1
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> - Steve
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>> 
>>> 
>>> 
>>> 
>> 
> 
> 

-- 
Steve Harris, CTO
Garlik, a part of Experian
+44 7854 417 874  http://www.garlik.com/
Registered in England and Wales 653331 VAT # 887 1335 93
Registered office: Landmark House, Experian Way, Nottingham, Notts, NG80 1ZZ
Received on Tuesday, 28 August 2012 15:49:41 UTC