Re: owl:sameAs - Harmful to provenance? from Jim McCusker on 2013-03-27 (public-semweb-lifesci@w3.org from March 2013)

From: Jim McCusker <james.mccusker@yale.edu>
Date: Wed, 27 Mar 2013 13:38:31 -0400
To: Rafael Richards <rafaelrichards@jhu.edu>
Cc: Oliver Ruebenacker <curoli@gmail.com>, David Booth <david@dbooth.org>, "<public-semweb-lifesci@w3.org>" <public-semweb-lifesci@w3.org>
Message-ID: <CAAtgn=RAso1r3H6yYOXoDOFhB-WHBxRUGgau9ROxvH0kY5iCOA@mail.gmail.com>
The short answer: not anymore, if you use prov:alternateOf and
prov:specializationOf instead.

Jim


On Wed, Mar 27, 2013 at 1:31 PM, Rafael Richards <rafaelrichards@jhu.edu>wrote:

>  This has been a very prolific thread, but did we discuss provenance?
>
>  A slideshare on  owl:sameAs - Harmful to Provenance is here:
>
>
> http://www.slideshare.net/jpmccusker/owlsameas-considered-harmful-to-provenance
>
>   Presentation Abstract:
> GOTO was once a standard operation in most computer programming languages.
> Edsger Dijkstra argued in 1968 that GOTO is a low level operation that is
> not appropriate for higher-level programming languages, and advocated
> structured programming in its place. Arguably, owl:sameAs in its current
> usage may be poised to go through a similar discussion and transformation
> period. In biomedical research, the provenance of information gathered is
> nearly as important as, and sometimes even more important than, the
> information itself. owl:sameAs allows someone to state that two separate
> descriptions really refer to the same entity. Currently that means that
> operational systems merge the descriptions and at the same time, merge the
> provenance information, thus losing the ability to retrieve where each
> individual description came from. This merging of provenance can be
> problematic or even catastrophic in biomedical applications that demand
> access to provenance information. Based on our knowledge of integration
> issues of data in biomedicine, we give examples as use cases of this issue
> in biospecimen management and experimental metadata representations. We
> suggest that systems using any construct like owl:sameAs must provide an
> option preserve the provenance of the entities and ground assertions
> related to those entities in question.
>
>
>  Rafael
>
>   *Rafael M. Richards, M.D., M.S.*
>  *Assistant Professor, *Anesthesiology & Critical Care Medicine****
> *Faculty, *Division of Health Science Informatics
>  Johns Hopkins School of Medicine
>  Baltimore, MD 2224-2760****
>  rafaelrichards [at] jhu edu
>
>
>
>  On Mar 27, 2013, at 11:02 AM, Oliver Ruebenacker <curoli@gmail.com>
>  wrote:
>
>     Hello David,
>
>  So if I understand your view correctly, then it could be expressed
> in a language close to yours as:
>
>  "Some people believe that if a URI occurs twice within a graph or
> statement, it refers to the same thing. But this is a myth! RDF never
> guarantees that two occurrences of the same URI mean the same thing."
>
>     Take care
>     Oliver
>
> On Wed, Mar 27, 2013 at 9:37 AM, David Booth <david@dbooth.org> wrote:
>
> Hi Oliver,
>
> On 03/25/2013 04:02 PM, Oliver Ruebenacker wrote:
>
>
>      Hello David,
>
>   We agree that there are different interpretations. But you haven't
> shown that the boundaries between interpretations are graphs
> boundaries (others, including me, think that each interpretation is
> global).
>
>
>
> I don't know what you mean by "boundaries between interpretations".
> An interpretation may be applied to any graph or statement to determine its
> truth value (or to a URI to determine the resource to which it is bound in
> that interpretation).
>
> The notion of a graph boundary is purely a matter of convenience and
> utility.  A graph can consist of *any* set of RDF triples.  If you wanted,
> you could apply an interpretation to a graph consisting of three randomly
> selected triples from each RDF document on the web, but it probably
> wouldn't
> be very useful to do so, because you probably would not care about the
> truth
> value of that graph.  We generally only apply an interpretation to a graph
> whose truth value we care about.
>
> An interpretation corresponds to the *use* of a graph.  Suppose I have a
> graph that "ambiguously" uses the same URI to denote both a toucan and its
> web page, without asserting that toucans cannot be web pages:
>
>   @prefix : <http://example/>
>   :tweety a :Toucan .
>   :tweety a :WebPage .
>
> When a conforming RDF application takes that RDF graph as input, assumes it
> is true, and produces some output such as "Tweety is a toucan", in effect
> the application has chosen a particular interpretation to apply to that
> graph.  In effect, the choice of interpretation causes the app to produce
> that particular output.  For example, the app might categorize animals into
> species, choosing an interpretation that maps :tweety to a kind of bird.
> But a different conforming RDF application that only cares about web page
> authorship might take that *same* RDF graph as input and choose a different
> interpretation that maps :tweety to a web page, instead outputting "Tweety
> is a web page".  In effect, the app has chosen an interpretation that is
> appropriate for its purpose.
>
> If the graph had also asserted :Toucan owl:disjointWith :WebPage, then the
> graph cannot be true under OWL semantics, and the graph (as is) would be
> unusable to both apps.
>
>
>   That makes me wonder whether you consider it in conformance with the
> specs to choose different boundaries?
>
>   For example, would you consider it conforming to apply a different
> interpretation to each statement? Or how about a different
> interpretation for each node of a statement? Do you see anything in
> the specs against doing so?
>
>
>
> Sure it is in conformance with the spec.  An interpretation can be applied
> to any graph or any RDF statement.  And certainly you could determine the
> truth value of N different statements according to N different
> interpretations.  But would it be useful to do so?  Probably not.
> Furthermore, if two statements are true under two different
> interpretations,
> that would not tell you whether a graph consisting of those two statements
> would be true under a single interpretation.
>
> OTOH, it *is* useful to apply different intepretations to different graphs,
> and one reason is that you may be using those graphs for different
> applications, each app in effect applying its own interpretation.  But the
> fact that those graphs may be true under different interpretations does
> *not* tell you whether the merge of those graphs will be true under a
> single
> interpretation.
>
> The RDF Semantics spec only tells you how to compute the truth value of one
> <interpretation, graph> pair at a time, but you can certainly apply it to
> as
> many <interpretation, graph> pairs as you want -- in full conformance with
> the intent of the spec.  This is the same as if I define a function f of
> two
> arguments, such that f(x,y) = x+y, that function definition only tells you
> how to compute f(x,y) for one pair of numbers at a time, but you can
> certainly apply it to as many pairs as you want, without in any way
> violating the intent of f's definition.
>
> David
>
>
>
>
> --
> IT Project Lead at PanGenX (http://www.pangenx.com)
> The purpose is always improvement
>
>
>


-- 
Jim McCusker
Programmer Analyst
Krauthammer Lab, Pathology Informatics
Yale School of Medicine
james.mccusker@yale.edu | (203) 785-4436
http://krauthammerlab.med.yale.edu

PhD Student
Tetherless World Constellation
Rensselaer Polytechnic Institute
mccusj@cs.rpi.edu
http://tw.rpi.edu
Received on Wednesday, 27 March 2013 17:39:21 UTC