Re: [Editorial] "blank nodes do not denote specific resources" from Richard Cyganiak on 2012-08-09 (public-rdf-comments@w3.org from August 2012)

From: Richard Cyganiak <richard@cyganiak.de>
Date: Thu, 9 Aug 2012 12:36:51 +0100
To: David Booth <david@dbooth.org>
Cc: public-rdf-comments <public-rdf-comments@w3.org>
Message-Id: <0CD2F820-20BD-4CC1-B34E-3A198167DC34@cyganiak.de>
Hi David,

You made several editorial comments regarding the handling of blank nodes in RDF 1.1 Concepts. Thanks for taking the time. The comments are helpful in refining the text.

I will reply below to multiple messages, in no particular order. I apologise for repeating some points that others have already made in the thread.


On 19 Jul 2012, at 19:57, David Booth wrote:
> On Thu, 2012-07-19 at 11:32 -0500, Pat Hayes wrote:
>> [ . . . ] In RDF, a "name" is either a URI reference (soon to be an
>> IRI) or a literal. 
> 
> But that definition of "name" does not appear in the current draft of
> the Concepts document:

Talking about “names” tends to cause confusion and ambiguity, and RDF Concepts avoids the term when possible. RDF (Concepts + Semantics) uses a particular technical sense of “naming” and calls it “denotation”. IRIs and literals “denote” resources. This is sketched in the RDF 1.1 Concepts introduction:

[[
Any IRI and literal denotes some thing in the universe of discourse. These things are called resources. Anything can be a resource, including physical things, documents, abstract concepts, numbers and strings; the term is synonymous with “entity”. The resource denoted by an IRI is called its referent, and the resource denoted by a literal is called its value.
]]

So, yeah, RDF Concepts already sketches this notion of “naming”, and RDF Semantics provides the matching definitions and formal details.


On 18 Jul 2012, at 19:02, David Booth wrote:
> http://www.w3.org/TR/rdf11-concepts/#resources-and-statements 
> says: "blank nodes do not denote specific resources".  I don't think
> that is quite correct, since a blank node *does* denote a specific
> resource.  

No, a blank node doesn't denote.

It is as the text explains: IRIs and literals denote resources; blank nodes don't denote resources, but merely indicate that some resource exists and (when the blank node is used in triples) that the resource stands in certain relationships to other resources.

> It just doesn't give that resource a name that is meaningful
> outside the graph.  I suggest rewording this as "blank nodes do not have
> stable names that can be referenced outside of the graph".

Well, this sentence is true, but misses the point.

The blank node ID isn't like a “local identifier for a resource”. It's not like an “IRI with local scope”. That kind of thinking gets people in trouble. It's purely an identifier for a local variable that can be used locally to hold multiple triples together.

For example, consider these two graphs:

  :Alice :knows _:Bob, _:Charlie.

  :Alice :knows _:Zach.

Both mean the same thing: “Alice knows someone.”

Regarding the sentence you quoted above, what do you think about this proposed replacement for the paragraph it occurs in? Note that this is in the *informative* *introduction*, so it's merely supposed to give a reader a reasonable intuition; it's not supposed to be a precise definition.

[[
Unlike IRIs and literals, a blank node does not denote a specific resource, but can be thought of as an anonymous variable. By using the blank node in triples, one can constrain the variable, and thereby say that some thing with certain relationships to other resources exists, without explicitly naming it.
]]

Is this an improvement?


On 18 July 2012, at 14:03, David Booth wrote:
> I'm not seeing anywhere a concise statement that actually says what a
> bnode *is*.  I think it would helpful to have one somewhere, probably
> here:
> 
> http://www.w3.org/TR/rdf11-concepts/#section-blank-nodes
 
Well, technically speaking, a blank node is a node in an RDF graph that is not an IRI and not a literal. That is all. This is a complete definition. RDF Concepts defines the RDF data model; it doesn't define syntax or semantics. Within the data model, there's nothing more to say really. Defining syntax and semantics is the job of other specs.

How about adding one more sentence and link to the beginning of subsection 3.4:

[[
Blank nodes are anonymous nodes in an RDF graph (see Section 1.2, Resources and Statements).
]]

The key being the link to Section 1.2, which contains the paragraph discussed above.

(The objection that blank nodes can have blank node IDs, and hence are not anonymous, is already dealt with in the section, and see below for a possible improvement.)


On 19 Jul 2012, at 19:57, David Booth wrote:
> There is some nice explanation in the RDF Semantics document of the fact
> that a "node identifier" like _:xxx is not a "name" and does not exist
> in the RDF graph:
> http://www.w3.org/TR/rdf-mt/#graphsyntax
> [[
> [The N-Triples syntax] uses a node identifier (nodeID) convention to
> indicate blank nodes in the triples of a graph.  While node identifiers
> such as '_:xxx' serve to identify blank nodes in the surface syntax,
> these expressions are not considered to be the label of the graph node
> they identify; they are not names, and do not occur in the actual graph.
> In particular, the RDF graphs described by two N-Triples documents which
> differ only by re-naming their node identifiers will be understood to be
> equivalent.
> ]]
> 
> Maybe a good way to help clarify this use of the word "name" and help
> reduce that common misunderstanding of RDF is to pull some of that
> explanatory material to the Introduction of the Concepts document,
> perhaps rewording to refer to serialization in general, rather than just
> N-Triples.

Well, but that's already there in RDF 1.1 Concepts, Section 3.4:

[[
Note: Blank node identifiers are local identifiers that are used in some concrete RDF syntaxes or RDF store implementations. They are always locally scoped to the file or RDF store, and are not persistent or portable identifiers for blank nodes. Blank node identifiers are not part of the RDF abstract syntax, but are entirely dependent on the concrete syntax or implementation. The syntactic restrictions on blank node identifiers, if any, therefore also depend on the concrete RDF syntax or implementation.
]]

The focus of that paragraph is a bit different; it stresses the fact that blank node identifiers cannot be expected to survive file/store boundaries.

Here's a proposed replacement for this paragraph that merges some of the RDF Semantics material above:

[[
NOTE: Some concrete RDF syntaxes and RDF store implementations use blank node identifiers (such as _:xxx in N-Triples, Turtle and SPARQL) to indicate blank nodes in the triples of a graph. While blank node identifiers serve to identify blank nodes in these surface syntaxes, these identifiers are not considered to be the label of the graph node they identify; they are not names, and do not occur in the actual graph. For example, the RDF graphs described by two N-Triples documents which differ only by re-naming their blank node identifiers will be understood to be equivalent.

The characters allowed in blank node identifiers, and the scope of these identifiers, are defined by the concrete syntaxes, and the rules may differ between different concrete syntaxes and storage implementations. Therefore, blank node identifiers are not portable or persistent identifiers for the nodes in a graph.
]]

Does this address your point above?

All the best,
Richard
Received on Thursday, 9 August 2012 11:37:21 UTC