Re: Scope of blank nodes in RDF from Sandro Hawke on 2012-09-06 (public-rdf-wg@w3.org from September 2012)

From: Sandro Hawke <sandro@w3.org>
Date: Thu, 06 Sep 2012 12:30:42 -0400
To: public-rdf-wg@w3.org
Message-ID: <5048CFB2.6090501@w3.org>
On 09/06/2012 10:02 AM, Richard Cyganiak wrote:
> Summary: In this message, I argue that:
>
> 1. Since RDF-WG is standardizing multigraphs and a notion of persistence for RDF data, we need to define the scope of blank nodes in the abstract syntax.

Ohhhh.     "the scope of blank nodes in the abstract syntax." Interesting.

I think we're crossing issues here, or something.     ISSUE-21 is about 
the scope of blank node *labels*.   It sounds like you're talking about 
the scope of blank nodes themselves, in acting as logic symbols.   If 
you are, that would be an RDF-wide issue, not a GRAPHS issue.

Let's see if I can be very clear about the difference here.

1.  ISSUE-21 (the scope of blank node labels in TriG).

In an RDF serialization, there are bindings from blank node labels to 
blank nodes.   (In RDF/XML, the blank node labels are called nodeIDs).   
These bindings are per-document in Turtle.  The spec says:

    A fresh RDF blank node is allocated for each unique blank node label
    in a document. Repeated use of the same blank node label identifies
    the same RDF blank node.

... so the scope of blank node labels in Turtle is the document.   I 
meant ISSUE-21 to be asking what is the scope of blank node labels in 
TriG.   The options are (0) leave it ambiguous, (1) document scope, (2) 
scope to the graph, (3) scope to the curly brackets.

(Options 2 and 3 differ only in the case where triples in a named graph 
are split into different curly-bracket expressions, which we decided to 
allow.)

I'm in favor of option (1) because it allows expressing arbitrary 
datasets without Skolemizing and de-Skolemizing.


2.  "the scope of blank nodes in the abstract syntax"

I'm not sure this concept makes sense.   But I understand the idea that 
in the abstract syntax IRIs act like logical constants.   We've had some 
discussion about whether a given IRI necessarily denotes the same thing 
everywhere or not.  That is, do IRIs have global scope, or some kind of 
smaller scope?     (I think we agreed the 2004 Semantics says they have 
global scope, but that's not necessarily what people do in the wild.)

So, in the same sense, blank nodes could have this kind of scope. Maybe 
a given blank node could denote one thing in one situation or context 
and a different thing in a different situation or context.       I don't 
like this idea -- I think IRIs should have global scope (although I see 
some appeal to bending that rule), and I think blank nodes should 
definitely have global scope.   Since blank nodes tend to be very local, 
I don't see any pressure to reuse one blank node with a different 
meaning, to let it have another scope.

a few more comments in-line below, although I can't say much until we 
sort out the above....

> 2. SPARQL Update should already have defined the scope of blank nodes for graph stores, and in fact is in conflict with some wording in RDF Concepts because it didn't.
> 3. The proposed resolution on sharing blank node labels across graphs in TriG closes the door to the simplest and most obvious way of fixing the scope of blank nodes.
> 4. I propose a different way of fixing the scope of blank nodes. This proposal is (I believe) compatible with SPARQL Update as it stands, should resolve the conflict between RDF Concepts and SPARQL Update, and allows sharing of bnode labels in TriG.
>
> This got a bit long; sorry for that.
>
>
>
> RDF Concepts, both in the 2004 and 1.1 versions, contains the following normative sentence:
>
> [[
> Given two blank nodes, it is possible to determine whether or not they are the same.
> ]]
>
> This is a constraint on the RDF data model, and hence on any other spec that uses RDF.
>
> Before SPARQL Update, it was easy to see that all the RDF-related W3C specs meet this constraint. No spec had any notion of persistence. RDF documents, RDF graphs and RDF datasets can all be seen as static snapshots. Any blank nodes mentioned are distinct from any those mentioned in any other static snapshot.

Yes, before SPARQL update there was no W3C standard way to interact with 
a blank node outside the document used to create it.    But people have 
created ways; lots of APIs do it, and in the telecon, Souri and Zhe 
reported that Oracle decided to provide a syntactic mechanism as well 
(using stable blank node labels).

I'm not sure whether Skolem IRIs will be another way to do this or not; 
it kind of depends how they end up being used.    If systems maintain 
long term stable mappings between the generated IRIs and internal blank 
nodes, then that will be another way to interact with blank nodes.    
(This seems like a bad practice to me, so far, but I wont be too 
surprised if someone ends up finding it very useful.)

> \
>
>
> In SPARQL Update, we now have persistent blank nodes. I believe that Graph Stores as defined in SPARQL Update do not meet the normative constraint above.
>
> Thought experiment: I have a graph store. It lives on a disk somewhere. I make a copy of that disk, ship the copy around the world, and start it up. Now we have two graph stores with two different sets of endpoints. Do they still contain the same blank nodes or not?

Tricky question.    Similarly, what if you ship the original disk? Or 
what if you just turn off the system and turn it back on?

I think we need to focus on observable system behaviors.

In these cases, I don't think there's any way to ask a system if they 
are the same blank node, so it doesn't matter.     (If it's maintain a 
stable Skolem mapping, then it would matter -- but then's it's barely a 
blank node any more....)

> The normative sentence above means that the SPARQL Update spec (or RDF Concepts, if we put the definition there) needs to somehow give an answer to this question.
>
> Does the answer matter? Yes, because we want to do things like federating multiple graph stores into one graph store, and I can ask SPARQL queries where it matters whether these blank nodes from different graph stores are considered the same or not. So to implement such a federation engine, we need an answer.

I don't think the existing SPARQL syntaxes/protocols provide any way to 
get at this distinction, and I think that's probably good.

To put it differently, SPARQL doesn't provide any way to move a blank 
node from one endpoint to a different one.    They are opaque and 
trapped within processes.

> It appears to me that SPARQL Update does not give an answer.
>
>
>
> My preferred approach to this issue would have been to adopt the axiom that blank nodes are scoped to a g-box, and hence different g-boxes contain different blank nodes; and then work out the consequences from that axiom.

How could blank nodes be "scoped" to g-boxes?   You mean if the same 
blank node occurs in two g-boxes (like the same variable name occurring 
in two scopes in a program) it denotes something different?   That seems 
like a very bad idea.  Or do you just mean blank nodes are forbidden 
from occurring in multiple g-boxes?   But that would break lots of 
deployed systems (eg 4-store, with its union-default graph).

> SPARQL Update has already thrown a big wrench into the gears here by allowing blank nodes to be copied between graphs; but perhaps this problem could have still been explained away.
>
> But allowing blank nodes to be shared between graphs in TriG and N-Quads would definitely kill that approach. This is why I have opposed this sharing of blank nodes in yesterday's call.
>
>
>
> Now, another approach might be to adopt a different axiom:
>
> [[
> PROPOSAL: Two different graph stores can never share a blank node. Even if both graph stores are based on the same data (e.g., one is a copy or subset or view of the other), their blank nodes are, by definition, disjoint.
> ]]

I like that idea, but I don't think there is even a crisp notion of 
"different graph stores", so that might not work.

> This should answer the question of blank node scope in the following way:
>
> 1. Within any concrete RDF document (TriG, Turtle, SPARQL results, etc.), blank nodes are scoped to that document, and the document syntax defines the rules that say whether two blank nodes are the same or not.

Sounds good, assuming you mean "blank node *labels* are scoped to that 
document".   If you want to conflate blank nodes and blank node labels, 
I want to see some proposed text changes for the Turtle document.

> 2. Within any persistent graph store, blank nodes are scoped to the graph store.

Again, I don't have any idea what you mean by "scoped" here.

> 3. The abstract mathematical structures (RDF graphs, RDF datasets, SPARQL result sequences) are always either the result of parsing a concrete document, or are a static snapshot of a persistent graph store (or part thereof), and their scope is the document or persistent store.

That sounds okay.

     - s

>
>
> Thoughts?
>
> Best,
> Richard
>
Received on Thursday, 6 September 2012 16:30:51 UTC