Re: Scope of blank nodes in RDF from Antoine Zimmermann on 2012-09-06 (public-rdf-wg@w3.org from September 2012)

From: Antoine Zimmermann <antoine.zimmermann@emse.fr>
Date: Thu, 06 Sep 2012 17:43:53 +0200
To: public-rdf-wg@w3.org
Message-ID: <5048C4B9.5030003@emse.fr>
I answered this email gradually while reading. I realised that my view 
aligns with your in the end, but I think it can be worthwhile to provide 
my comments unedited, if you don't mind reading them.


Le 06/09/2012 16:02, Richard Cyganiak a écrit :
> Summary: In this message, I argue that:
>
> 1. Since RDF-WG is standardizing multigraphs and a notion of
> persistence for RDF data, we need to define the scope of blank nodes
> in the abstract syntax. 2. SPARQL Update should already have defined
> the scope of blank nodes for graph stores, and in fact is in conflict
> with some wording in RDF Concepts because it didn't. 3. The proposed
> resolution on sharing blank node labels across graphs in TriG closes
> the door to the simplest and most obvious way of fixing the scope of
> blank nodes. 4. I propose a different way of fixing the scope of
> blank nodes. This proposal is (I believe) compatible with SPARQL
> Update as it stands, should resolve the conflict between RDF Concepts
> and SPARQL Update, and allows sharing of bnode labels in TriG.
>
> This got a bit long; sorry for that.
>
>
>
> RDF Concepts, both in the 2004 and 1.1 versions, contains the
> following normative sentence:
>
> [[ Given two blank nodes, it is possible to determine whether or not
> they are the same. ]]

I don't understand this sentence.

I can interpret it in two ways:
  1. implementations MUST make bnodes distinguishable; or
  2. it's a fact, bnodes are distinguishable.

In the first case, it seems the wording is not well chosen. Moreover, if 
I take the RDF graph serialised at 
http://www.w3.org/People/Berners-Lee/card.rdf, where there are several 
bnodes, and I take http://danbri.org/foaf.rdf, where there are also 
bnodes, how do I know which of the bnodes of the first graph are 
identical, or different from which bnodes of the other graph?

It seems that the serialisations do not provide the necessary means to 
distinguish bnodes.

Nothing in those files and in the specification allows me to know to 
what extent those graphs' bnodes overlap.

In the second case, if such fact is true, then I'd like to know how. Is 
it proven/provable that bnodes can be distinguished? Or is it just an 
axiom? In any case, the second option does not provide much in terms of 
how things should be implemented.


> This is a constraint on the RDF data model, and hence on any other
> spec that uses RDF.
>
> Before SPARQL Update, it was easy to see that all the RDF-related W3C
> specs meet this constraint. No spec had any notion of persistence.
> RDF documents, RDF graphs and RDF datasets can all be seen as static
> snapshots. Any blank nodes mentioned are distinct from any those
> mentioned in any other static snapshot.

Where did you see this?

if you have an RDF document that contains:

[]  :prop  1 .

and I have another document with:

[]  :prop  2 .

it is possible that your document is serialising the first triple of a 
graph like this (represented in Turtle for convenience):

[]  :prop  1, 2 .

and that I am serialising the second triple of it. So we are using the 
same bnode. But again, I don't know how you can know this without 
additional knowledge.

The key is: you need extra knowledge to decide whether 2 graphs share 
bnodes or not. Sandro's proposal is to set this extra knowledge in the 
spec by saying that bnode identifiers are referring to the same node 
within a Trig file, and SPARQL apparently agrees with this assumption.


> In SPARQL Update, we now have persistent blank nodes. I believe that
> Graph Stores as defined in SPARQL Update do not meet the normative
> constraint above.
>
> Thought experiment: I have a graph store. It lives on a disk
> somewhere. I make a copy of that disk, ship the copy around the
> world, and start it up. Now we have two graph stores with two
> different sets of endpoints. Do they still contain the same blank
> nodes or not?
>
> The normative sentence above means that the SPARQL Update spec (or
> RDF Concepts, if we put the definition there) needs to somehow give
> an answer to this question.
>
> Does the answer matter? Yes, because we want to do things like
> federating multiple graph stores into one graph store, and I can ask
> SPARQL queries where it matters whether these blank nodes from
> different graph stores are considered the same or not. So to
> implement such a federation engine, we need an answer.

I think this is a very good argument. Yet, not impossible to overcome: 
let us say that any snapshot of a dataset container is assumed to have 
disjoint bnodes with any other snapshot of a different dataset container.


> It appears to me that SPARQL Update does not give an answer.

I think you're right, but can be fixed.


> My preferred approach to this issue would have been to adopt the
> axiom that blank nodes are scoped to a g-box, and hence different
> g-boxes contain different blank nodes; and then work out the
> consequences from that axiom.

Personnally, I can live very well with this view, but I also think that 
the potential problems provoked by the other view are all possible to 
overcome without huge investment. And given that the general tendency in 
this working group leans toward shared bnodes, I accept this fate.


> SPARQL Update has already thrown a big wrench into the gears here by
> allowing blank nodes to be copied between graphs; but perhaps this
> problem could have still been explained away.
>
> But allowing blank nodes to be shared between graphs in TriG and
> N-Quads would definitely kill that approach. This is why I have
> opposed this sharing of blank nodes in yesterday's call.
>
>
>
> Now, another approach might be to adopt a different axiom:
>
> [[ PROPOSAL: Two different graph stores can never share a blank node.
> Even if both graph stores are based on the same data (e.g., one is a
> copy or subset or view of the other), their blank nodes are, by
> definition, disjoint. ]]

Ok. Then we agree.


> This should answer the question of blank node scope in the following
> way:
>
> 1. Within any concrete RDF document (TriG, Turtle, SPARQL results,
> etc.), blank nodes are scoped to that document, and the document
> syntax defines the rules that say whether two blank nodes are the
> same or not.
 >
> 2. Within any persistent graph store, blank nodes are scoped to the
> graph store.
>
> 3. The abstract mathematical structures (RDF graphs, RDF datasets,
> SPARQL result sequences) are always either the result of parsing a
> concrete document, or are a static snapshot of a persistent graph
> store (or part thereof), and their scope is the document or
> persistent store.

It's all good for me.


>
>
>
> Thoughts?
>
> Best, Richard
>

-- 
Antoine Zimmermann
ISCOD / LSTI - Institut Henri Fayol
École Nationale Supérieure des Mines de Saint-Étienne
158 cours Fauriel
42023 Saint-Étienne Cedex 2
France
Tél:+33(0)4 77 42 66 03
Fax:+33(0)4 77 42 66 66
http://zimmer.aprilfoolsreview.com/
Received on Thursday, 6 September 2012 15:44:29 UTC