Re: owl:sameAs - Is it used in a right way? from David Booth on 2013-03-17 (public-semweb-lifesci@w3.org from March 2013)

From: David Booth <david@dbooth.org>
Date: Sun, 17 Mar 2013 00:20:58 -0400
To: Jim McCusker <mccusj@rpi.edu>
CC: Jeremy J Carroll <jjc@syapse.com>, Umutcan ŞİMŞEK <s.umutcan@gmail.com>, Kingsley Idehen <kidehen@openlinksw.com>, w3c semweb HCLS <public-semweb-lifesci@w3.org>
Message-ID: <514544AA.4060807@dbooth.org>
Hi Jim,

On 03/16/2013 12:37 PM, Jim McCusker wrote:
> I'm not terribly interested in a Humpty Dumpty interpretation of the web
> of data.

Well, you'd better get used to it, because that interpretation is 
standard RDF Semantics.  I don't think it's going away any time soon.

> That's part of the motivation for having global identifiers
> like URIs/URLs.

Exactly!  That's why the idea that "a URI identifies one resource" is "a 
good goal, and helpful as a guide to URI users", even though it is not 
actually true.

> There's no point in merging ANY graphs under this view,
> since you have no way of knowing if the referents are the same.

Not true!  Don't throw the baby out with the bath.  When you merge 
graphs, you force the referents to be the same.  Sometimes the merge 
works fine, and sometimes the merge becomes inconsistent.  Just because 
you cannot *always* merge two graphs without causing inconsistency does 
not mean that merging is pointless.  It just means that *some* graphs 
can be merged and others cannot.  That is only a problem if your 
expectations of being able to merge any two graphs are set 
unrealistically high.

> I'm not
> saying that people don't denote different things with the same URI, I'm
> saying that, by using a URI that someone else controls, you are
> accepting their denotation of it.

You're preaching to the choir on that one!  I certainly agree with that 
architecture, but that is only part of the story.  The problem is that 
there is inherent ambiguity about the resource that a URI denotes.  This 
is inescapable.  And it means that two different, well-intentioned RDF 
authors can reasonably interpret a URI's resource identity differently, 
and those differences can cause conflicts to show up when their graphs 
are merged.

As a simple example, suppose Owen, a URI owner, mints a URI :apple to 
denote an apple.  As the URI's owner, he defines the URI's resource 
identity using the following RDF statements:

   # Owen's definition of :apple
   @prefix : <http://example/owen/> .
   :apple a :Apple .

Arthur, a URI author, then publishes his own RDF statements about Owen's 
apple (standard prefix definitions omitted for brevity):

   # Arthur's statements about Owen's apple
   @prefix : <http://example/owen/> .
   :apple a :GreenApple .
   :GreenApple rdfs:subClassOf :Apple .

Note that Arthur's statements are entirely consistent with Owen's 
definition of :apple .

Now Aster, another URI author, also publishes some RDF statements about 
Owen's apple.  She also uses Owen's apple definition, but has no 
knowledge of Arthur's statements.  Aster writes:

   # Aster's statements about Owen's apple
   @prefix : <http://example/owen/> .
   :apple a :RedApple .
   :RedApple rdfs:subClassOf :Apple .
   :RedApple owl:disjointWith :GreenApple .

Note that Aster's statements are also consistent with Owen's definition 
of :apple.

Finally, Connie, an RDF consumer, discovers Arthur and Aster's graphs 
and wishes to merge them.  Unfortunately, the merge is inconsistent,

It is tempting to assume that someone did something "wrong" here.  For 
example, one might claim that Owen's definition was ambiguous, or that 
Arthur and Aster should not have made assumptions about the color of 
Owen's apple if Owen did not state the color in his definition.  Indeed, 
in this simple example it is easy to see where the conflicting 
assumptions crept in.  In real life, when you're dealing with thousands 
or millions of RDF statements, it is usually far more subtle.

One might also assume that color is an intrinsic property of the apple, 
and hence is somehow different from other properties that one might 
assert.  Imagine instead that Arthur had stated ":apple a :GoodFruit" 
and Aster had stated ":apple a :BadFruit" (assuming :GoodFruit 
owl:disjointWith :BadFruit).  The result would have been the same when 
Connie attempted to merge their graphs.  Since, AFAIK, there is no 
objective way to distinguish between intrinsic properties and 
non-intrinsic properties, the color example should suffice.

The real problem is that *any* time you make an RDF statement about a 
resource, and that statement goes beyond what was said in the resource 
definition, you further constrain the identity of that resource, whether 
you mean to or not.  I.e., you further constrain the set of satisfying 
interpretations.

I submit that neither Owen nor Arthur nor Aster did anything 
fundamentally wrong.  Owen was not wrong, because it is fundamentally 
impossible for Owen to be completely unambiguous about :apple's resource 
identity.  And Arthur and Aster did nothing fundamentally wrong, 
because: (a) they simply made statements about :apple ; and (b) AFAICT 
there is no fundamental difference between statements that constrain a 
resource's identity and any other statements about that resource.  In 
RDF semantics, they all simply add constraints to the possible 
interpretations.

The problem is just that Arthur and Aster happened to (unknowingly) make 
conflicting statements about :apple .  There's no need to cry foul here. 
  We just have to learn to live with this possibility.  And one good 
technique is what Jeremy suggested: keep different perspectives in 
different graphs, and only join them if you need to.

David
Received on Sunday, 17 March 2013 04:21:26 UTC