Re: owl:sameAs - Is it used in a right way? from Pat Hayes on 2013-03-20 (public-semweb-lifesci@w3.org from March 2013)

From: Pat Hayes <phayes@ihmc.us>
Date: Tue, 19 Mar 2013 23:20:26 -0500
To: David Booth <david@dbooth.org>
Cc: Jim McCusker <mccusj@rpi.edu>, Jeremy J Carroll <jjc@syapse.com>, Umutcan ŞİMŞEK <s.umutcan@gmail.com>, Kingsley Idehen <kidehen@openlinksw.com>, w3c semweb HCLS <public-semweb-lifesci@w3.org>
Message-Id: <B84C22E5-07BA-4125-ABC5-99C730516C9E@ihmc.us>
On Mar 18, 2013, at 5:03 PM, David Booth wrote:

> Hi Pat,
> 
> On 03/18/2013 12:10 AM, Pat Hayes wrote:
> 
>>> On Sun, Mar 17, 2013 at 12:20 AM, David Booth <david@dbooth.org>
>>> . . .   When you merge
>>> graphs, you force the referents to be the same.  Sometimes the
>>> merge works fine, and sometimes the merge becomes inconsistent.
>> 
>> The merge always 'works'. Any set of RDF graphs entails its merge.
>> When the merge is inconsistent, it reveals that the original data was
>> inconsistent.
> 
> By "sometimes the merge works" I meant "sometimes the merge is consistent".  "Works" was short hand.  Sorry it was unclear.
> 
>>> . . .  two different,
>>> well-intentioned RDF authors can reasonably interpret a URI's
>>> resource identity differently
>> 
>> That also is true, but...
>> 
>>> , and those differences can cause conflicts to show up when their
>>> graphs are merged.
>> 
>> ...if they differ that much, then this goes beyond mere (and
>> unavoidable) ambiguity: it means they genuinely *disagree*, openly
>> enough for this disagreement to be revealed by RDF machinery.
> 
> Yes, but it can be a *consequence* of ambiguity.  

I don't see how ambiguity can produce a contradiction. If the two A guys had been a little more ambiguous, maybe they wouldnt be arguing at this point. 

> The point is that Arthur and Aster (in the example below) only disagree with each other. Neither of them disagreed with Owen's definition.

True, but so what? They disagree, is the point, and the inconsistency reveals that disagreement. 

> 
>>> [ . . . ]
>>> Finally, Connie, an RDF consumer, discovers Arthur and Aster's
>>> graphs and wishes to merge them.  Unfortunately, the merge is
>>> inconsistent,
>> 
>> Why unfortunately? Arthur and Aster apparently disagree with each
>> other, and the inconsistency simply reveals that disagreement. That
>> is a useful datum if you are trying to figure out who you might want
>> to believe.
> 
> Yes, it can be useful in that way.  But it is unfortunate for Connie because Connie cannot compute useful entailments from the merge, because the merge is false, and a false premise entails everything.

If Connie has more sense than my cat, she can for example pick apart the RDF and store the consistent pieces in a dataset with labels recording their provenance, and maybe try drawing useful conclusions from these and see to what extent they have entailments in common. For example. 

>> 
>>> It is tempting to assume that someone did something "wrong" here.
>>> For example, one might claim that Owen's definition was ambiguous,
>>> or that Arthur and Aster should not have made assumptions about the
>>> color of Owen's apple if Owen did not state the color in his
>>> definition.  Indeed, in this simple example it is easy to see where
>>> the conflicting assumptions crept in.  In real life, when you're
>>> dealing with thousands or millions of RDF statements, it is usually
>>> far more subtle.
>> 
>> True, but that does not change the essentials.
>>> 
>>> One might also assume that color is an intrinsic property of the
>>> apple, and hence is somehow different from other properties that
>>> one might assert.  Imagine instead that Arthur had stated ":apple a
>>> :GoodFruit" and Aster had stated ":apple a :BadFruit" (assuming
>>> :GoodFruit owl:disjointWith :BadFruit).  The result would have been
>>> the same when Connie attempted to merge their graphs.  Since,
>>> AFAIK, there is no objective way to distinguish between intrinsic
>>> properties and non-intrinsic properties, the color example should
>>> suffice.
>> 
>> You might assert that color is an owl:functionalProperty. That would
>> do the trick.
> 
> Yes, it may be possible for *some* properties, but that isn't the problem.  The problem is to provide an algorithm that, given any property p and resource r, determines whether p is an intrinsic property of r.  (First we'd have to define what we mean by "intrinsic property"!  There have been lots of fruitless discussions on the W3C TAG list about what are the "essential characteristics" of a resource.)

The TAG seems to be very good at getting itself stuck in famous philosophical rat-holes. Maybe it should be relocated from MIT to La Brea.

> 
>>> [ . . . ]
>>> I submit that neither Owen nor Arthur nor Aster did anything
>>> fundamentally wrong.  Owen was not wrong, because it is
>>> fundamentally impossible for Owen to be completely unambiguous
>>> about :apple's resource identity.  And Arthur and Aster did nothing
>>> fundamentally wrong, because: (a) they simply made statements about
>>> :apple
>> 
>> Did they know that these statements were true? They can't both have
>> known this.
>> 
>>> ; and (b) AFAICT there is no fundamental difference between
>>> statements that constrain a resource's identity and any other
>>> statements about that resource.  In RDF semantics, they all simply
>>> add constraints to the possible interpretations.
>>> 
>>> The problem is just that Arthur and Aster happened to (unknowingly)
>>> make conflicting statements about :apple .  There's no need to cry
>>> foul here.
>> 
>> Its not logical-RDF-AWWW foul, but it is bad form to make assertions
>> that you are claiming to be true (and other readers might rely on)
>> when you don't have a clue if they are true or not. At least one of
>> Arthur or Aster must be being this careless.
> 
> I think that's an unnecessary value judgement.  In this simple example, it is easy to make that judgement.  But In real life, people often make the best statements they can, making the most accurate statements that they believe true, and they still disagree.  E.g., does God exist or does God not exist?   Even in scientific realms where we think we can be objective, reasonable intelligent people still disagree.

But RDF isnt intended to be used in theology. It is intended for recording data, and most data is pretty mundane stuff about which there is not a lot of factual disagreement. 

> 
> I don't think there is any point in being judgemental about it, because the bottom line is that different sets of RDF are useful to different applications.  If an application provides useful value consuming some RDF data that may not be 100% correct in the way it models the world, that is still A Good Thing.  The example that I usually use is RDF data that models the world as flat.  It works fine for car navigation, but would be useless for computing rocket trajectories, and it is *simpler* than RDF data that models the world in 3D.  Would you tell the publisher not to publish that data, because it is incorrect?  I hope not.

I would tell them to publish it using conventions (and associated terminologies) which make their intentions clear, or at least accessible by following URI links. And if they don't do that, I would blame them, yes. 

>  Even if can suggest good ways by which the publisher can publish data that is both simple and accurate, would you discourage someone from publishing data before it is known to be perfect?

Not perfect, but with every effort made to have it be not misunderstood, and to be factually accurate, yes. And certainly, if exigencies require speed over quality, then the publisher is still responsible for errors in their published data. And I would make that, legally responsible, in cases where it matters. 

>  I hope not.
> 
> In the end, if some RDF data is providing useful value at lower cost, that is more important than being "correct" -- assuming we could even agree on what "correct" means.  It seem to me that the most practical way forward is to accept that RDF publishers will have different, conflicting perspectives, and learn to deal with them, without getting huffy if person A's RDF is inconsistent with person B's RDF.

Different perspectives is not likely to produce outright logical contraditions. A more likely symptom is not being able to draw any conclusions at all, because your IRIs are different from mine. 

Pat

> 
> David
> 
>> 
>>> We just have to learn to live with this possibility.  And one good
>>> technique is what Jeremy suggested: keep different perspectives in
>>> different graphs, and only join them if you need to.
>> 
>> True, provided you can make sense of what a perspective is.
>> 
>> Pat
>> 
>>> 
>>> David
>>> 
>>> 
>>> 
>>> -- Jim McCusker Programmer Analyst Krauthammer Lab, Pathology
>>> Informatics Yale School of Medicine james.mccusker@yale.edu | (203)
>>> 785-4436 http://krauthammerlab.med.yale.edu
>>> 
>>> PhD Student Tetherless World Constellation Rensselaer Polytechnic
>>> Institute mccusj@cs.rpi.edu http://tw.rpi.edu
>> 
>> ------------------------------------------------------------ IHMC
>> (850)434 8903 or (650)494 3973 40 South Alcaniz St.
>> (850)202 4416   office Pensacola                            (850)202
>> 4440   fax FL 32502                              (850)291 0667
>> mobile phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
> 
> 

------------------------------------------------------------
IHMC                                     (850)434 8903 or (650)494 3973   
40 South Alcaniz St.           (850)202 4416   office
Pensacola                            (850)202 4440   fax
FL 32502                              (850)291 0667   mobile
phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
Received on Wednesday, 20 March 2013 04:20:58 UTC