Re: Skolemization and RDF Semantics from Pat Hayes on 2011-04-17 (public-rdf-wg@w3.org from April 2011)

From: Pat Hayes <phayes@ihmc.us>
Date: Sun, 17 Apr 2011 09:15:25 -0500
To: Richard Cyganiak <richard@cyganiak.de>
Cc: Steve Harris <steve.harris@garlik.com>, Dan Brickley <danbri@danbri.org>, David Wood <dpw@talis.com>, "public-rdf-wg@w3.org" <public-rdf-wg@w3.org>
Message-Id: <DF21A70A-A03A-4A3C-81A6-DAB652685D6B@ihmc.us>
On Apr 17, 2011, at 6:27 AM, Richard Cyganiak wrote:

> On 16 Apr 2011, at 23:27, Pat Hayes wrote:
>> So, I think that all this careful wording SHOULD be understood to apply only under a special circumstance, where some data is modified by inserting URIs in place of bnodes, *and the new version is claimed to be essentially the same content as the old, bnode, version*. That is, when the new skolemised RDF is not re-published by a new publisher who takes responsibility for it, but is seen rather as a re-rendering or a normalization of the old data, inheriting the original publisher's authority and provenance. 
>> 
>> A way to put the point is to ask, who 'owns' the Skolem URIs that are used in the new (version of the) data? The original publisher can legitimately disclaim all responsibility for them if they have been introduced downstream and outside her control. Perhaps all we need to say is that anyone who replaces a bnode with a URI themselves is the owner of that URI and is responsible for accounting for its meaning; but one  way to discharge this responsibility is to use a legitimate skolem URI which can be recognized as such. 
> 
> +1
> 
>> *Why* is it bad to use bnodes?
> 
> First of all, it is *sometimes* but not *always* bad to use blank nodes. The documents I linked to gave specific advice, informed by implementation experience, for when to use, and when to avoid, blank nodes.

True, but it does say that the fewer bnodes the better, as a general rule about all data.

> 
> That being said:
> 
> Given a triple _:a :bbb :ccc, it is not possible to author another triple _:a :xxx :yyy in another graph, the intention being that _:a is the same thing in both graphs. Given that the blank node label is arbitrary and cannot be assumed to be persistent, it is not possible to refer to the graph node from outside of the system where the graph originated.

I think you mean not possible to refer to the entity denoted by the blank node from outside, etc. To do that you have to give it a name, indeed. You can do this, if it is absolutely necessary,  by adding 
_:a owl:sameAs <URI> .
to the first graph and then using the URI outside. So it is possible when it needs to be done. But...

> Such outside reference to certain nodes is a requirement in a distributed system.

...why? . Surely it all depends on the node in question. Some things need to be publicly referable to, and these obviously should be given a URI. Others don't. The inner lists in an RDF collection used to encode some OWL syntax should never need to be referred to elsewhere, for example. 
 
> 
>> *Why* is data using them worse than data which does not?
> 
> Because it is difficult to augment data that uses blank nodes with further data. Because it requires stepping outside of the RDF data model in order to remotely modify or otherwise work with an RDF graph that uses blank nodes.

For the first point, see above. I don't follow the second point. **Of course** it is possible to modify RDF containing blank nodes, just as one can with ground RDF. An RDF graph is just a large data object, you can do whatever you want to it.  Can you be more precise about what exactly the problems are here?

>> Worse in what sense, exactly?
> 
> Worse in the sense that it imposes large, and often prohibitive, additional costs on users of the data, which usually is not in the best interest of the publishers of the data.

You have not yet convinced me why or how this is so. 

>> Which processes are made more difficult when blank nodes are present?
> 
> Referring to nodes in the graph from other data; storing persistent references to a graph node for later recall;

You can't refer to nodes in RDF at all. I think what you mean is, URIs allow one to refer to the same entity in different graphs, whereas bnodeIDs are local to the graph and so have no meaning outside the graph. True; but again, I don't see why this is a practical problem. What plausible processes would ever need to access a locally scoped ID? Can you give an example? 

> integrating RDF graphs from different sources

What bnode problem is encountered here? 

> ; hyperlinking between RDF graphs

Again, why do bnodes cause a problem with such linking? 

> ; updating and modifying RDF graphs;

And again, I do not see any reason why the presence of bnodes makes updating and modifying more difficult. 

> merging RDF graphs;

Well, yes, there is a cost here, but it is surely not high enough to warrant such a draconian rule. How often do such merges happen? And in such a case, what the spec should do, at most, is point out the cost, not recommend courses of action based on the presumed need to avoid it.

> writing specifications.

Now, there I will agree with you. If RDF had never had bnodes in the first place, the lives of many WGs, including ours, would have been a lot easier. But (unfortunately) this will not be fixed by telling the world that bnodes are a Bad Thing.

> 
>> And so forth. If answers to such questions are available, then let us discuss them and publish them if we all agree, but even then only in an informative note, not as part of the spec. 
> 
> The purpose of a specification is to promote interoperability between implementations. Implementation advice and usage notes are an important part of that. What are you trying to achieve by objecting to the inclusion of such material into the specification?

I just want to make sure that this material is based in fact and not just a kind of folk rumor. Specifications have to last for years and be usable in a wider range of circumstances than their writers (us) can imagine. They have to pass a very high barrier of accuracy and precision, therefore. 

Pat

> 
> Best,
> Richard

------------------------------------------------------------
IHMC                                     (850)434 8903 or (650)494 3973   
40 South Alcaniz St.           (850)202 4416   office
Pensacola                            (850)202 4440   fax
FL 32502                              (850)291 0667   mobile
phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
Received on Sunday, 17 April 2011 14:16:02 UTC