A modest proposal concerning blank nodes. from Pat Hayes on 2011-03-02 (public-rdf-wg@w3.org from March 2011)

From: Pat Hayes <phayes@ihmc.us>
Date: Wed, 2 Mar 2011 16:47:17 -0600
To: RDF-WG WG <public-rdf-wg@w3.org>
Message-Id: <BC17EBDF-251E-4DEF-BB6C-1DA402EF9372@ihmc.us>
Ahem. 

Thinking about this (below) and reading recent threads, I think I agree. Blank nodes are more trouble than they are worth. Lets get rid of them. Simply eliminating blank nodes from the RDF conceptual model would have many benefits, not the least being an enormous simplification of both the conceptual model and the semantics. (And coming from me, this is quite a concession, I hope y'all duly note.) This would satisfy the linked data folk, I am sure, and make SPARQL (and RDB2RDF) theorists a lot happier. RIF has already given up on RDF blank nodes and re-defined its own version of them, so it will hardly mind. I don't think OWL will even notice it they are there or not. We logicians would weep a silent tear for the loss of a quantifier, but console ourselves with the observation that Skolemization is named after a logician, after all. 

But RDF really does need some way to easily enable someone to talk about "something" without having to invent a whole URI to 'identify' the thing. Many things - lists created just to be the arguments of a n-ary relation, for example - really do not deserve to be 'identified'. The tag URI scheme [1] goes a long way towards this, but it still seems to me to be overkill. Most of the complexity seems (?) to arise from the need to ensure that these URIs are globally unique, so there cannot be any accidental use clashes. Now, this is basically the same problem as the issue that William Waites noted [2], of keeping bnode IDs from getting confused with one another; but right now this is the responsibility of the system developer, whereas using a URI scheme like this tag scheme makes it ultimately the responsibility of the user coining the URIs. So I wonder if there is some way to 'bury' this so that its the system developer's task to keep this straight. 

So here's an idea. See if this flies. We say that the conceptual model of RDF has no blank nodes, period. (A whole lot of the specs suddenly get simpler and easier to follow, and large parts of the SWeb world exhales a communal sigh of relief.) We also officially sanction a 'blank' URI scheme for use where we want an 'anonymous' name, maybe the tag scheme.  In other words, we require blank nodes to be 'skolemized' in the conceptual model, and we provide a recommended way to generate 'skolem constants'. (Recommended rather than mandatory to allow other ways to use URIs systematically.) But we also recommend that any RDF text notation - any serialization of RDF intended for human use - shall provide some way to have 'local' identifiers which look just like blank node identifiers, but are replaced by these anonymous URIs in some systematic way before being transmitted or used. So blank nodes become a kind of surface syntactic sugar rather than part of the actual RDF graph. (And then, by the way, it is up to the writers of that surface notation to determine the scope of their blank node identifiers.) This keeps all the advantages of blank nodes for human use (chiefly, that their IDs can be short and can be re-used as often as one likes, and don't need to be globally unique) while keeping the underlying RDF free from all the blank-node issues that keep giving people headaches. 

We can also require that all RDF processors be able to input existing RDF notations which have syntactic forms for blank node identifiers, either by storing the RDF in this form or by skolemizing it on input. This sets up a backward-compatible situation which is strongly biassed to eliminate blank nodes as rapidly as possible from actual deployed RDF. We can even call these tag-labelled nodes "blank nodes" if we like, with only a tiny change to the current RDF concepts specifications. 

OK, I will send this now and wait for the hurricane to start. 

Pat

[1]   http://www.ietf.org/rfc/rfc4151.txt
[2]   http://lists.w3.org/Archives/Public/semantic-web/2011Mar/0053.html

On Mar 2, 2011, at 1:35 PM, Richard Cyganiak wrote [on semantic-web@w3.org]:

> Reto,
> 
> On 2 Mar 2011, at 18:50, Reto Bachmann-Gmür wrote:
>>> Is there any practical difference between bnodes and normal nodes, 
>>> except the scope (and necessity) of their name? 
>> 
>> Yes, a graph with bnodes can potentially be simplified: the same meaning may be expressed with a more lean graph, i.e. with less nodes and triples. If all your nodes are uris you cannot do simplifications with rdf entaillment. 
> 
> Reality check please!
> 
> When was the last time you saw such a non-lean RDF graph in the wild, outside of examples and test cases? Can you name a production system that routinely performs the simplification you talk about, with user benefit?
> 
> The question was about practice. You describe a thought experiment. I think it's a good example of a complication in RDF that was added for sound theoretical reasons, but has failed to deliver any value whatsoever in practice.
> 
> Best,
> Richard
> 
> 

------------------------------------------------------------
IHMC                                     (850)434 8903 or (650)494 3973   
40 South Alcaniz St.           (850)202 4416   office
Pensacola                            (850)202 4440   fax
FL 32502                              (850)291 0667   mobile
phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
Received on Wednesday, 2 March 2011 22:47:54 UTC