Re: Skolemization and RDF Semantics from Steve Harris on 2011-04-17 (public-rdf-wg@w3.org from April 2011)

From: Steve Harris <steve.harris@garlik.com>
Date: Sun, 17 Apr 2011 13:18:01 +0100
To: Richard Cyganiak <richard@cyganiak.de>
Cc: Pat Hayes <phayes@ihmc.us>, Dan Brickley <danbri@danbri.org>, David Wood <dpw@talis.com>, "public-rdf-wg@w3.org" <public-rdf-wg@w3.org>
Message-Id: <B6795BE3-A3E4-450B-8285-F11A7B04D927@garlik.com>

On 2011-04-17, at 12:27, Richard Cyganiak wrote:

> On 16 Apr 2011, at 23:27, Pat Hayes wrote:
>> So, I think that all this careful wording SHOULD be understood to apply only under a special circumstance, where some data is modified by inserting URIs in place of bnodes, *and the new version is claimed to be essentially the same content as the old, bnode, version*. That is, when the new skolemised RDF is not re-published by a new publisher who takes responsibility for it, but is seen rather as a re-rendering or a normalization of the old data, inheriting the original publisher's authority and provenance. 
>> 
>> A way to put the point is to ask, who 'owns' the Skolem URIs that are used in the new (version of the) data? The original publisher can legitimately disclaim all responsibility for them if they have been introduced downstream and outside her control. Perhaps all we need to say is that anyone who replaces a bnode with a URI themselves is the owner of that URI and is responsible for accounting for its meaning; but one  way to discharge this responsibility is to use a legitimate skolem URI which can be recognized as such. 

+1

>> *Why* is it bad to use bnodes?
> 
> First of all, it is *sometimes* but not *always* bad to use blank nodes. The documents I linked to gave specific advice, informed by implementation experience, for when to use, and when to avoid, blank nodes.

However, if it becomes a commonly available option to skolemise to URIs then some of the reasons become lesser.

> That being said:
> 
> Given a triple _:a :bbb :ccc, it is not possible to author another triple _:a :xxx :yyy in another graph, the intention being that _:a is the same thing in both graphs. Given that the blank node label is arbitrary and cannot be assumed to be persistent, it is not possible to refer to the graph node from outside of the system where the graph originated. Such outside reference to certain nodes is a requirement in a distributed system.

SPARQL Update makes that less true. e.g.

INSERT {
   GRAPH <G> { ?x :xxx :yyy }
}
WHERE {
   ?x :bbb :ccc
}

It's not generally possible to do it in multiple operations though.

- Steve

>> *Why* is data using them worse than data which does not?
> 
> Because it is difficult to augment data that uses blank nodes with further data. Because it requires stepping outside of the RDF data model in order to remotely modify or otherwise work with an RDF graph that uses blank nodes.
> 
>> Worse in what sense, exactly?
> 
> Worse in the sense that it imposes large, and often prohibitive, additional costs on users of the data, which usually is not in the best interest of the publishers of the data.
> 
>> Which processes are made more difficult when blank nodes are present?
> 
> Referring to nodes in the graph from other data; storing persistent references to a graph node for later recall; integrating RDF graphs from different sources; hyperlinking between RDF graphs; updating and modifying RDF graphs; merging RDF graphs; writing specifications.
> 
>> And so forth. If answers to such questions are available, then let us discuss them and publish them if we all agree, but even then only in an informative note, not as part of the spec. 
> 
> The purpose of a specification is to promote interoperability between implementations. Implementation advice and usage notes are an important part of that. What are you trying to achieve by objecting to the inclusion of such material into the specification?
> 
> Best,
> Richard

-- 
Steve Harris, CTO, Garlik Limited
1-3 Halford Road, Richmond, TW10 6AW, UK
+44 20 8439 8203  http://www.garlik.com/
Registered in England and Wales 535 7233 VAT # 849 0517 11
Registered office: Thames House, Portsmouth Road, Esher, Surrey, KT10 9AD

Received on Sunday, 17 April 2011 12:18:30 UTC