Re: owl:sameAs - Harmful to provenance? from Rafael Richards on 2013-03-27 (public-semweb-lifesci@w3.org from March 2013)

From: Rafael Richards <rafaelrichards@jhu.edu>
Date: Wed, 27 Mar 2013 17:31:11 +0000
To: Oliver Ruebenacker <curoli@gmail.com>
CC: David Booth <david@dbooth.org>, "<public-semweb-lifesci@w3.org>" <public-semweb-lifesci@w3.org>
Message-ID: <29C2E86EB9143E4397BDFA62B6F6E748120A10F9@BAYEXCH-CL-4.win.ad.jhu.edu>

This has been a very prolific thread, but did we discuss provenance?

A slideshare on owl:sameAs - Harmful to Provenance is here:

http://www.slideshare.net/jpmccusker/owlsameas-considered-harmful-to-provenance

Presentation Abstract:
GOTO was once a standard operation in most computer programming languages. Edsger Dijkstra argued in 1968 that GOTO is a low level operation that is not appropriate for higher-level programming languages, and advocated structured programming in its place. Arguably, owl:sameAs in its current usage may be poised to go through a similar discussion and transformation period. In biomedical research, the provenance of information gathered is nearly as important as, and sometimes even more important than, the information itself. owl:sameAs allows someone to state that two separate descriptions really refer to the same entity. Currently that means that operational systems merge the descriptions and at the same time, merge the provenance information, thus losing the ability to retrieve where each individual description came from. This merging of provenance can be problematic or even catastrophic in biomedical applications that demand access to provenance information. Based on our knowledge of integration issues of data in biomedicine, we give examples as use cases of this issue in biospecimen management and experimental metadata representations. We suggest that systems using any construct like owl:sameAs must provide an option preserve the provenance of the entities and ground assertions related to those entities in question.

Rafael

Rafael M. Richards, M.D., M.S.
Assistant Professor, Anesthesiology & Critical Care Medicine
Faculty, Division of Health Science Informatics
Johns Hopkins School of Medicine
Baltimore, MD 2224-2760
rafaelrichards [at] jhu edu

On Mar 27, 2013, at 11:02 AM, Oliver Ruebenacker <curoli@gmail.com>
wrote:

Hello David,

So if I understand your view correctly, then it could be expressed
in a language close to yours as:

"Some people believe that if a URI occurs twice within a graph or
statement, it refers to the same thing. But this is a myth! RDF never
guarantees that two occurrences of the same URI mean the same thing."

Take care
Oliver

On Wed, Mar 27, 2013 at 9:37 AM, David Booth <david@dbooth.org> wrote:
Hi Oliver,

On 03/25/2013 04:02 PM, Oliver Ruebenacker wrote:

Hello David,

We agree that there are different interpretations. But you haven't
shown that the boundaries between interpretations are graphs
boundaries (others, including me, think that each interpretation is
global).

I don't know what you mean by "boundaries between interpretations".
An interpretation may be applied to any graph or statement to determine its
truth value (or to a URI to determine the resource to which it is bound in
that interpretation).

The notion of a graph boundary is purely a matter of convenience and
utility. A graph can consist of *any* set of RDF triples. If you wanted,
you could apply an interpretation to a graph consisting of three randomly
selected triples from each RDF document on the web, but it probably wouldn't
be very useful to do so, because you probably would not care about the truth
value of that graph. We generally only apply an interpretation to a graph
whose truth value we care about.

An interpretation corresponds to the *use* of a graph. Suppose I have a
graph that "ambiguously" uses the same URI to denote both a toucan and its
web page, without asserting that toucans cannot be web pages:

@prefix : <http://example/>
:tweety a :Toucan .
:tweety a :WebPage .

When a conforming RDF application takes that RDF graph as input, assumes it
is true, and produces some output such as "Tweety is a toucan", in effect
the application has chosen a particular interpretation to apply to that
graph. In effect, the choice of interpretation causes the app to produce
that particular output. For example, the app might categorize animals into
species, choosing an interpretation that maps :tweety to a kind of bird.
But a different conforming RDF application that only cares about web page
authorship might take that *same* RDF graph as input and choose a different
interpretation that maps :tweety to a web page, instead outputting "Tweety
is a web page". In effect, the app has chosen an interpretation that is
appropriate for its purpose.

If the graph had also asserted :Toucan owl:disjointWith :WebPage, then the
graph cannot be true under OWL semantics, and the graph (as is) would be
unusable to both apps.

That makes me wonder whether you consider it in conformance with the
specs to choose different boundaries?

For example, would you consider it conforming to apply a different
interpretation to each statement? Or how about a different
interpretation for each node of a statement? Do you see anything in
the specs against doing so?

Sure it is in conformance with the spec. An interpretation can be applied
to any graph or any RDF statement. And certainly you could determine the
truth value of N different statements according to N different
interpretations. But would it be useful to do so? Probably not.
Furthermore, if two statements are true under two different interpretations,
that would not tell you whether a graph consisting of those two statements
would be true under a single interpretation.

OTOH, it *is* useful to apply different intepretations to different graphs,
and one reason is that you may be using those graphs for different
applications, each app in effect applying its own interpretation. But the
fact that those graphs may be true under different interpretations does
*not* tell you whether the merge of those graphs will be true under a single
interpretation.

The RDF Semantics spec only tells you how to compute the truth value of one
<interpretation, graph> pair at a time, but you can certainly apply it to as
many <interpretation, graph> pairs as you want -- in full conformance with
the intent of the spec. This is the same as if I define a function f of two
arguments, such that f(x,y) = x+y, that function definition only tells you
how to compute f(x,y) for one pair of numbers at a time, but you can
certainly apply it to as many pairs as you want, without in any way
violating the intent of f's definition.

David

--
IT Project Lead at PanGenX (http://www.pangenx.com)
The purpose is always improvement

Received on Wednesday, 27 March 2013 17:32:04 UTC