Re: A modest proposal concerning blank nodes. from Pat Hayes on 2011-03-03 (public-rdf-wg@w3.org from March 2011)

From: Pat Hayes <phayes@ihmc.us>
Date: Wed, 2 Mar 2011 18:17:51 -0600
To: Richard Cyganiak <richard@cyganiak.de>
Cc: RDF-WG WG <public-rdf-wg@w3.org>
Message-Id: <E13B452E-1322-48FD-BAF6-672394B62DFB@ihmc.us>
On Mar 2, 2011, at 5:54 PM, Richard Cyganiak wrote:

> Hi Pat,
> 
> Wow, this is unexpected!
> 
> On 2 Mar 2011, at 22:47, Pat Hayes wrote:
>> So here's an idea. See if this flies.
> [snip]
>> This keeps all the advantages of blank nodes for human use (chiefly, that their IDs can be short and can be re-used as often as one likes, and don't need to be globally unique) while keeping the underlying RDF free from all the blank-node issues that keep giving people headaches. 
> 
> In my eyes, the problem isn't blank nodes per se. In RDF Concepts, blank nodes are ~5 paragraphs. They are easy enough to understand. A graph can have anonymous nodes that don't have a global identifier. No big deal.
> 
> The problem is that RDF Semantics then defines these nodes as things that can be multiplied and merged in ways that is just way over most people's head. And all that complexity just doesn't seem to do anything useful in practice!
> 
> Ignoring the question of charter scope for a minute: I'd like to know how RDF Semantics would have to be changed to define blank nodes as “normal” graph nodes that “just” don't happen to have a global/persistent identifier. I believe this is also what Marcelo and colleagues proposed in [1] for the RDF Next Steps workshop.

It would be easy - trivial - to treat bnodes semantically as though they were names, like URIs. But then they really would be global names, just like URIs. That is, the interpretation mapping would apply to every occurrence of a given bnode in exactly the same way wherever it occurs. And as it is blank, you have no way to know where it occurs, right? So this name-that-isnt-a-name is like a semantic loose cannon. Which I don't *think* is what anyone would actually want.

It really is somethjng that needs to be fixed in Concepts, as the basic problem is syntactic, not semantic. The whole issue is scope: exactly how to specify the scope of a bnode (or a bnode ID, if you prefer to think that way; it comes to the same thing.) To tell you the truth, the reason bnodes are in RDF is because we were trying to avoid getting into this whole question of scope, with its issues of bound and free names and so one. It is a syntactic nightmare, even if a familiar one. We thought that we had a clever way to side-step it, but we didn't quite manage to avoid it, which is why we got stuck with all that garbage about standardizing apart and merge versus union and so on. Sigh. But I think we can do this now, if we are careful. Using Sandro's terminology: the scope of a bnode is at most a single g-box or a g-text. ("At most" because we might want to have boxes with several 'graphs' in them, I guess.) (I think this is what Nathan just suggested, also.) That is, a given bnode cannot occur in two different g-boxes or g-texts. We will have to impose this as a requirement in the conceptual model of RDF (its not there at present.) BUt if we do that, then we have got rid of the loose-cannon danger, since we can't get a single bnode which is accidentally also in some other graph somewhere without anyone knowing about it. (And we then also got rid of the graph merge =/= graph union weirdness, BTW.) And then, yes, we can treat blank nodes semantically as though they were already skolemized in the semantics. I wonder if people will think this is less of a change than deprecating them in favor of a tag-style URI, though. 


> 
> Or alternatively, since everyone (including the SPARQL and OWL WGs) seems to like RDF Concepts, but ignores or hates parts of RDF Semantics: Can we refactor the specs so that the good parts would be separated from the unnecessarily complicated stuff?

Well, those good parts do need just a little extra tweaking. But I really don't think anyone will object to the minor tweaks. Im going to guess that most readers think they already say this, in fact.

Pat

> 
> Best,
> Richard
> 
> 
> [1] http://www.w3.org/2009/12/rdf-ws/papers/ws23
> 
> 
> 
>> 
>> We can also require that all RDF processors be able to input existing RDF notations which have syntactic forms for blank node identifiers, either by storing the RDF in this form or by skolemizing it on input. This sets up a backward-compatible situation which is strongly biassed to eliminate blank nodes as rapidly as possible from actual deployed RDF. We can even call these tag-labelled nodes "blank nodes" if we like, with only a tiny change to the current RDF concepts specifications. 
>> 
>> OK, I will send this now and wait for the hurricane to start. 
>> 
>> Pat
>> 
>> [1]   http://www.ietf.org/rfc/rfc4151.txt
>> [2]   http://lists.w3.org/Archives/Public/semantic-web/2011Mar/0053.html
>> 
>> On Mar 2, 2011, at 1:35 PM, Richard Cyganiak wrote [on semantic-web@w3.org]:
>> 
>>> Reto,
>>> 
>>> On 2 Mar 2011, at 18:50, Reto Bachmann-Gmür wrote:
>>>>> Is there any practical difference between bnodes and normal nodes, 
>>>>> except the scope (and necessity) of their name? 
>>>> 
>>>> Yes, a graph with bnodes can potentially be simplified: the same meaning may be expressed with a more lean graph, i.e. with less nodes and triples. If all your nodes are uris you cannot do simplifications with rdf entaillment. 
>>> 
>>> Reality check please!
>>> 
>>> When was the last time you saw such a non-lean RDF graph in the wild, outside of examples and test cases? Can you name a production system that routinely performs the simplification you talk about, with user benefit?
>>> 
>>> The question was about practice. You describe a thought experiment. I think it's a good example of a complication in RDF that was added for sound theoretical reasons, but has failed to deliver any value whatsoever in practice.
>>> 
>>> Best,
>>> Richard
>>> 
>>> 
>> 
>> ------------------------------------------------------------
>> IHMC                                     (850)434 8903 or (650)494 3973   
>> 40 South Alcaniz St.           (850)202 4416   office
>> Pensacola                            (850)202 4440   fax
>> FL 32502                              (850)291 0667   mobile
>> phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
>> 
>> 
>> 
>> 
>> 
>> 
> 
> 

------------------------------------------------------------
IHMC                                     (850)434 8903 or (650)494 3973   
40 South Alcaniz St.           (850)202 4416   office
Pensacola                            (850)202 4440   fax
FL 32502                              (850)291 0667   mobile
phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
Received on Thursday, 3 March 2011 00:18:29 UTC