Re: Skolemization and RDF Semantics from Richard Cyganiak on 2011-04-17 (public-rdf-wg@w3.org from April 2011)

From: Richard Cyganiak <richard@cyganiak.de>
Date: Sun, 17 Apr 2011 12:27:29 +0100
To: Pat Hayes <phayes@ihmc.us>
Cc: Steve Harris <steve.harris@garlik.com>, Dan Brickley <danbri@danbri.org>, David Wood <dpw@talis.com>, "public-rdf-wg@w3.org" <public-rdf-wg@w3.org>
Message-Id: <D5FFB5DB-CA1D-4A58-8496-B306F45425B0@cyganiak.de>

On 16 Apr 2011, at 23:27, Pat Hayes wrote:
> So, I think that all this careful wording SHOULD be understood to apply only under a special circumstance, where some data is modified by inserting URIs in place of bnodes, *and the new version is claimed to be essentially the same content as the old, bnode, version*. That is, when the new skolemised RDF is not re-published by a new publisher who takes responsibility for it, but is seen rather as a re-rendering or a normalization of the old data, inheriting the original publisher's authority and provenance. 
> 
> A way to put the point is to ask, who 'owns' the Skolem URIs that are used in the new (version of the) data? The original publisher can legitimately disclaim all responsibility for them if they have been introduced downstream and outside her control. Perhaps all we need to say is that anyone who replaces a bnode with a URI themselves is the owner of that URI and is responsible for accounting for its meaning; but one  way to discharge this responsibility is to use a legitimate skolem URI which can be recognized as such. 

+1

> *Why* is it bad to use bnodes?

First of all, it is *sometimes* but not *always* bad to use blank nodes. The documents I linked to gave specific advice, informed by implementation experience, for when to use, and when to avoid, blank nodes.

That being said:

Given a triple _:a :bbb :ccc, it is not possible to author another triple _:a :xxx :yyy in another graph, the intention being that _:a is the same thing in both graphs. Given that the blank node label is arbitrary and cannot be assumed to be persistent, it is not possible to refer to the graph node from outside of the system where the graph originated. Such outside reference to certain nodes is a requirement in a distributed system.

> *Why* is data using them worse than data which does not?

Because it is difficult to augment data that uses blank nodes with further data. Because it requires stepping outside of the RDF data model in order to remotely modify or otherwise work with an RDF graph that uses blank nodes.

> Worse in what sense, exactly?

Worse in the sense that it imposes large, and often prohibitive, additional costs on users of the data, which usually is not in the best interest of the publishers of the data.

> Which processes are made more difficult when blank nodes are present?

Referring to nodes in the graph from other data; storing persistent references to a graph node for later recall; integrating RDF graphs from different sources; hyperlinking between RDF graphs; updating and modifying RDF graphs; merging RDF graphs; writing specifications.

> And so forth. If answers to such questions are available, then let us discuss them and publish them if we all agree, but even then only in an informative note, not as part of the spec. 

The purpose of a specification is to promote interoperability between implementations. Implementation advice and usage notes are an important part of that. What are you trying to achieve by objecting to the inclusion of such material into the specification?

Best,
Richard

Received on Sunday, 17 April 2011 11:28:00 UTC