Scope of blank nodes in RDF

Summary: In this message, I argue that:

1. Since RDF-WG is standardizing multigraphs and a notion of persistence for RDF data, we need to define the scope of blank nodes in the abstract syntax.
2. SPARQL Update should already have defined the scope of blank nodes for graph stores, and in fact is in conflict with some wording in RDF Concepts because it didn't.
3. The proposed resolution on sharing blank node labels across graphs in TriG closes the door to the simplest and most obvious way of fixing the scope of blank nodes.
4. I propose a different way of fixing the scope of blank nodes. This proposal is (I believe) compatible with SPARQL Update as it stands, should resolve the conflict between RDF Concepts and SPARQL Update, and allows sharing of bnode labels in TriG.

This got a bit long; sorry for that.

RDF Concepts, both in the 2004 and 1.1 versions, contains the following normative sentence:

Given two blank nodes, it is possible to determine whether or not they are the same.

This is a constraint on the RDF data model, and hence on any other spec that uses RDF.

Before SPARQL Update, it was easy to see that all the RDF-related W3C specs meet this constraint. No spec had any notion of persistence. RDF documents, RDF graphs and RDF datasets can all be seen as static snapshots. Any blank nodes mentioned are distinct from any those mentioned in any other static snapshot.

In SPARQL Update, we now have persistent blank nodes. I believe that Graph Stores as defined in SPARQL Update do not meet the normative constraint above.

Thought experiment: I have a graph store. It lives on a disk somewhere. I make a copy of that disk, ship the copy around the world, and start it up. Now we have two graph stores with two different sets of endpoints. Do they still contain the same blank nodes or not?

The normative sentence above means that the SPARQL Update spec (or RDF Concepts, if we put the definition there) needs to somehow give an answer to this question.

Does the answer matter? Yes, because we want to do things like federating multiple graph stores into one graph store, and I can ask SPARQL queries where it matters whether these blank nodes from different graph stores are considered the same or not. So to implement such a federation engine, we need an answer.

It appears to me that SPARQL Update does not give an answer.

My preferred approach to this issue would have been to adopt the axiom that blank nodes are scoped to a g-box, and hence different g-boxes contain different blank nodes; and then work out the consequences from that axiom.

SPARQL Update has already thrown a big wrench into the gears here by allowing blank nodes to be copied between graphs; but perhaps this problem could have still been explained away.

But allowing blank nodes to be shared between graphs in TriG and N-Quads would definitely kill that approach. This is why I have opposed this sharing of blank nodes in yesterday's call.

Now, another approach might be to adopt a different axiom:

PROPOSAL: Two different graph stores can never share a blank node. Even if both graph stores are based on the same data (e.g., one is a copy or subset or view of the other), their blank nodes are, by definition, disjoint.

This should answer the question of blank node scope in the following way:

1. Within any concrete RDF document (TriG, Turtle, SPARQL results, etc.), blank nodes are scoped to that document, and the document syntax defines the rules that say whether two blank nodes are the same or not.

2. Within any persistent graph store, blank nodes are scoped to the graph store.

3. The abstract mathematical structures (RDF graphs, RDF datasets, SPARQL result sequences) are always either the result of parsing a concrete document, or are a static snapshot of a persistent graph store (or part thereof), and their scope is the document or persistent store.



Received on Thursday, 6 September 2012 14:03:05 UTC