Re: A modest proposal concerning blank nodes. from Pat Hayes on 2011-03-03 (public-rdf-wg@w3.org from March 2011)

From: Pat Hayes <phayes@ihmc.us>
Date: Wed, 2 Mar 2011 20:29:36 -0600
To: David Wood <david.wood@talis.com>
Cc: RDF-WG WG <public-rdf-wg@w3.org>
Message-Id: <DF4F8D94-551E-4F3D-8517-8CBD13412668@ihmc.us>
On Mar 2, 2011, at 7:54 PM, David Wood wrote:

> Holy crap!!
> 
> ...and I argued for Pat's inclusion, too.  Hmm.

I take it that there was some argument, then. Hmmmm :-)

> 
> I'm going to sleep on this, but do note the charter says:
> 
> [[
> 3.  Out of Scope
> Some features are explicitly out of scope for the Working Group
> 
> - Changing the fundamentals RDF(S) semantics (e.g., usage of model theoretical semantics, interpretation of blank nodes).

Um... strictly speaking, my proposal was to get rid of blank nodes from the graph syntax, which doesn't exactly change their interpretation; and to allow them in a 'surface' syntax, where they would be exactly as they are now. Which might just squeeze past the charter wording :-)

But really we ought to see what the OWLers and RIFers say about this idea. They might shoot it down on hard technical grounds. 

I just wanted to put it on the table to see what would happen. Fact is, from the narrow perspective of the model-theoretic semantics, it would be very easy to do.

Pat


> Note that minor improvements may be required by some of the work in the scope of the Working Group, which is still in the scope of the work.
> ]]
> 
> Full BC may fall into charter compliance (if we want to argue for that), so I am not of a mind to reject this out of hand, especially since:
> 
> [[
> The Working Group will publish a series of documents on the basis of the 2004 version of the RDF recommendation. I.e., the following documents may be updated:
> ...
> RDF Semantics
> ]]
> 
> The finding of the RDF Next Steps Workshop in relation to blank nodes is at:
>  http://www.w3.org/2001/sw/wiki/index.php?title=RDF_Core_Work_Items&oldid=1980#Blank_Nodes
> 
> At that workshop, the participants voted strongly that revising blank node semantics is something that the RDF WG should/amy do, but it didn't make the formal cut into the charter following community review.  It is worth noting that *nobody* voted the the WG must not do it at the workshop.  See:
>  http://www.w3.org/2010/06/rdf-work-items/table
> 
> Regards,
> Dave
> 
> 
> 
> 
> On Mar 2, 2011, at 17:47, Pat Hayes wrote:
> 
>> Ahem.
>> 
>> Thinking about this (below) and reading recent threads, I think I agree. Blank nodes are more trouble than they are worth. Lets get rid of them. Simply eliminating blank nodes from the RDF conceptual model would have many benefits, not the least being an enormous simplification of both the conceptual model and the semantics. (And coming from me, this is quite a concession, I hope y'all duly note.) This would satisfy the linked data folk, I am sure, and make SPARQL (and RDB2RDF) theorists a lot happier. RIF has already given up on RDF blank nodes and re-defined its own version of them, so it will hardly mind. I don't think OWL will even notice it they are there or not. We logicians would weep a silent tear for the loss of a quantifier, but console ourselves with the observation that Skolemization is named after a logician, after all.
>> 
>> But RDF really does need some way to easily enable someone to talk about "something" without having to invent a whole URI to 'identify' the thing. Many things - lists created just to be the arguments of a n-ary relation, for example - really do not deserve to be 'identified'. The tag URI scheme [1] goes a long way towards this, but it still seems to me to be overkill. Most of the complexity seems (?) to arise from the need to ensure that these URIs are globally unique, so there cannot be any accidental use clashes. Now, this is basically the same problem as the issue that William Waites noted [2], of keeping bnode IDs from getting confused with one another; but right now this is the responsibility of the system developer, whereas using a URI scheme like this tag scheme makes it ultimately the responsibility of the user coining the URIs. So I wonder if there is some way to 'bury' this so that its the system developer's task to keep this straight.
>> 
>> So here's an idea. See if this flies. We say that the conceptual model of RDF has no blank nodes, period. (A whole lot of the specs suddenly get simpler and easier to follow, and large parts of the SWeb world exhales a communal sigh of relief.) We also officially sanction a 'blank' URI scheme for use where we want an 'anonymous' name, maybe the tag scheme.  In other words, we require blank nodes to be 'skolemized' in the conceptual model, and we provide a recommended way to generate 'skolem constants'. (Recommended rather than mandatory to allow other ways to use URIs systematically.) But we also recommend that any RDF text notation - any serialization of RDF intended for human use - shall provide some way to have 'local' identifiers which look just like blank node identifiers, but are replaced by these anonymous URIs in some systematic way before being transmitted or used. So blank nodes become a kind of surface syntactic sugar rather than part of the actual RDF graph. (And then, by the way, it is up to the writers of that surface notation to determine the scope of their blank node identifiers.) This keeps all the advantages of blank nodes for human use (chiefly, that their IDs can be short and can be re-used as often as one likes, and don't need to be globally unique) while keeping the underlying RDF free from all the blank-node issues that keep giving people headaches.
>> 
>> We can also require that all RDF processors be able to input existing RDF notations which have syntactic forms for blank node identifiers, either by storing the RDF in this form or by skolemizing it on input. This sets up a backward-compatible situation which is strongly biassed to eliminate blank nodes as rapidly as possible from actual deployed RDF. We can even call these tag-labelled nodes "blank nodes" if we like, with only a tiny change to the current RDF concepts specifications.
>> 
>> OK, I will send this now and wait for the hurricane to start.
>> 
>> Pat
>> 
>> [1]   http://www.ietf.org/rfc/rfc4151.txt
>> [2]   http://lists.w3.org/Archives/Public/semantic-web/2011Mar/0053.html
>> 
>> On Mar 2, 2011, at 1:35 PM, Richard Cyganiak wrote [on semantic-web@w3.org]:
>> 
>>> Reto,
>>> 
>>> On 2 Mar 2011, at 18:50, Reto Bachmann-Gmür wrote:
>>>>> Is there any practical difference between bnodes and normal nodes,
>>>>> except the scope (and necessity) of their name?
>>>> 
>>>> Yes, a graph with bnodes can potentially be simplified: the same meaning may be expressed with a more lean graph, i.e. with less nodes and triples. If all your nodes are uris you cannot do simplifications with rdf entaillment.
>>> 
>>> Reality check please!
>>> 
>>> When was the last time you saw such a non-lean RDF graph in the wild, outside of examples and test cases? Can you name a production system that routinely performs the simplification you talk about, with user benefit?
>>> 
>>> The question was about practice. You describe a thought experiment. I think it's a good example of a complication in RDF that was added for sound theoretical reasons, but has failed to deliver any value whatsoever in practice.
>>> 
>>> Best,
>>> Richard
>>> 
>>> 
>> 
>> ------------------------------------------------------------
>> IHMC                                     (850)434 8903 or (650)494 3973
>> 40 South Alcaniz St.           (850)202 4416   office
>> Pensacola                            (850)202 4440   fax
>> FL 32502                              (850)291 0667   mobile
>> phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> Please consider the environment before printing this email.
>> 
>> Find out more about Talis at http://www.talis.com/
>> shared innovation™
>> 
>> Any views or personal opinions expressed within this email may not be those of Talis Information Ltd or its employees. The content of this email message and any files that may be attached are confidential, and for the usage of the intended recipient only. If you are not the intended recipient, then please return this message to the sender and delete it. Any use of this e-mail by an unauthorised recipient is prohibited.
>> 
>> Talis Information Ltd is a member of the Talis Group of companies and is registered in England No 3638278 with its registered office at Knights Court, Solihull Parkway, Birmingham Business Park, B37 7YB.
>> 
>> Talis North America is Talis Inc., 11400 Branch Ct., Fredericksburg, VA 22408, United States of America.
> 
> 

------------------------------------------------------------
IHMC                                     (850)434 8903 or (650)494 3973   
40 South Alcaniz St.           (850)202 4416   office
Pensacola                            (850)202 4440   fax
FL 32502                              (850)291 0667   mobile
phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
Received on Thursday, 3 March 2011 02:30:16 UTC