Re: [Concepts] Editorial changes to Blank Nodes (ISSUE-107)

From: Pat Hayes <phayes@ihmc.us>
Date: Mon, 12 Nov 2012 20:28:01 -0800
Cc: Antoine Zimmermann <antoine.zimmermann@emse.fr>, RDF Working Group WG <public-rdf-wg@w3.org>
Message-Id: <20C75D6E-3382-4F94-8903-7BA15648331F@ihmc.us>
To: Richard Cyganiak <richard@cyganiak.de>

On Nov 12, 2012, at 8:00 AM, Richard Cyganiak wrote:

> On 12 Nov 2012, at 09:09, Antoine Zimmermann wrote:
>>> The *blank nodes* in an RDF graph are drawn from some arbitrary infinite
>>> set that fulfils the following conditions:
>>> • It is disjoint from the set of IRIs and the set of all literals.
>>> • Equality within the set is well-defined (*blank node equality*).
>> What does the second item mean? Isn't equality well defined, in any set?

Yes, it is. If identity isnt clear, then the very idea of a set is not clear. 

> The problem is that infinite sets cannot actually be implemented, and therefore implementations need to approximate the definition. The sentence draws attention to the requirement that in such approximate implementations, it must still be possible to test blank nodes for equality.

This makes no sense at all to me. (Are you saying that identity of items in infinite sets must be in some sense approximate?? But I can take something from a finite set and include it in another, infinite, set and its still the same thing, so its identity does not change.)

I don't see that implementation has anything to do with things at this point, as we are describing the abstract mathematical model.

>> It is the same as saying "Given two blank nodes, it is possible to determine whether or not they are the same."
> Yes. It's a restatement of that phrase.

Well, I guess it is, in that that phrase also doesn't make sense, though for different reasons. 

>> The later say that in an implementation, either the set of bnodes is explicitly known, or the implementation knows an isomorphism from a well known set to the set of bnodes. E.g., assign a bnode id to all bnodes, then one decides if two occurrences of bnodes involve the same bnode by simply comparing the identifiers.
> Sure, this is yet another restatement of the same phrase. Are you proposing a particular edit?
>>> Allocating a *fresh blank node* is the action of drawing a new node from the set.
>>> ]]
>>> http://dvcs.w3.org/hg/rdf/raw-file/default/rdf-concepts/index.html#section-blank-nodes
>> It is not quite clear in what way it is "new". It has to be new wrt a given RDF graph (that is, a bnode that is not already used in a given RDF graph).
> That's not correct. It has to be globally new. Remember, blank nodes can be shared between graphs.

But this does not make sense as stated. To make things more concrete, consider the set N of natural numbers, and consider the statement: "Allocating a fresh number is the action of drawing a new natural number from the set N."  What could this possibly mean? How does one "draw" a number "from" a set of numbers? (Does that just mean "Choose a number"?) What would "new" mean in this context? What kind of "global" would be meaningful?  Blank nodes are just like numbers in this way. 

Its a mistake to think of this in 'process' terms, that gets things confused. The whole idea of blank nodes was to be a simple underlying *mathematical* model which would underlie any particular processing or implementaiton strategy.

>>> [[
>>> Since RDF systems generally refer to blank nodes only via such local identifiers, it is necessary to “standardize apart” the blank node identifiers when incorporating data that originates from an external source. This may be done by systematically replacing the blank node identifiers in incoming data with freshly allocated blank node identifiers.
>>> ]]
>> In fact, if the bnode IDs had global scope, this would still be necessary. The "standardisation apart" is part of the merge operation and is independent of the way bnodes are identified. The "standardisation apart" has to be made at the abstract syntax level, that is, the bnodes themselves, not the IDs, have to be changed.
> This isn't about the merge operation; this is about the case of, say, loading a graph into a new slot in a graph store. If blank node identifiers in the incoming data are systematically replaced, then, I believe, this operation is safe; otherwise it is not.

The *operation* is safe, yes. But what we are debating is how to express the underlying mathematical model so as to make this work. And the assumption we need is that the blank nodes themselves are not shared between the newly loaded graph and the graphs already present in the store. 

> Graph merge is a separate issue, and we don't talk about it in this section.
>> [As a side note, I think things would have been simpler, IMHO, if all bnodes had a globally unique identifier. It would also have made the discussions on the scope of bnodes easier, since we would have avoided discussing the scope of *identifiers*, and confusing the two types of scopes.]
> If they have globally unique identifiers, then surely they should be IRIs, no? And then how is that any different from not having blank nodes at all?

I tend to agree with you here, and against Antoine. But lets not even try to go there :-)

> It certainly is a mess.

It is actually quite simple, but it is not easy to explain, which I guess does make it into a mess. Mia culpa. I thought that having this idea of a 'global' set of blank nodes would be a way to avoid the other well-known mess of dealing with scoping rules for bound variables, which would have been the obvious way to handle (what are now called) blank nodes in RDF. Scoping and the bound/free distinction were notoriously hard to describe in relational logic, but programmers are so used to this way of thinking that it might have worked better than the simpler bnode idea. Wisdom after the event, I guess. 


> Best,
> Richard

