Re: owl:sameAs - Is it used in a right way? from David Booth on 2013-03-18 (public-semweb-lifesci@w3.org from March 2013)

From: David Booth <david@dbooth.org>
Date: Mon, 18 Mar 2013 18:03:33 -0400
To: Pat Hayes <phayes@ihmc.us>
CC: Jim McCusker <mccusj@rpi.edu>, Jeremy J Carroll <jjc@syapse.com>, Umutcan ŞİMŞEK <s.umutcan@gmail.com>, Kingsley Idehen <kidehen@openlinksw.com>, w3c semweb HCLS <public-semweb-lifesci@w3.org>
Message-ID: <51478F35.9050201@dbooth.org>
Hi Pat,

On 03/18/2013 12:10 AM, Pat Hayes wrote:

>> On Sun, Mar 17, 2013 at 12:20 AM, David Booth <david@dbooth.org>
>> . . .   When you merge
>> graphs, you force the referents to be the same.  Sometimes the
>> merge works fine, and sometimes the merge becomes inconsistent.
>
> The merge always 'works'. Any set of RDF graphs entails its merge.
> When the merge is inconsistent, it reveals that the original data was
> inconsistent.

By "sometimes the merge works" I meant "sometimes the merge is 
consistent".  "Works" was short hand.  Sorry it was unclear.

>> . . .  two different,
>> well-intentioned RDF authors can reasonably interpret a URI's
>> resource identity differently
>
> That also is true, but...
>
>> , and those differences can cause conflicts to show up when their
>> graphs are merged.
>
> ...if they differ that much, then this goes beyond mere (and
> unavoidable) ambiguity: it means they genuinely *disagree*, openly
> enough for this disagreement to be revealed by RDF machinery.

Yes, but it can be a *consequence* of ambiguity.  The point is that 
Arthur and Aster (in the example below) only disagree with each other. 
Neither of them disagreed with Owen's definition.

>> [ . . . ]
>> Finally, Connie, an RDF consumer, discovers Arthur and Aster's
>> graphs and wishes to merge them.  Unfortunately, the merge is
>> inconsistent,
>
> Why unfortunately? Arthur and Aster apparently disagree with each
> other, and the inconsistency simply reveals that disagreement. That
> is a useful datum if you are trying to figure out who you might want
> to believe.

Yes, it can be useful in that way.  But it is unfortunate for Connie 
because Connie cannot compute useful entailments from the merge, because 
the merge is false, and a false premise entails everything.

>
>> It is tempting to assume that someone did something "wrong" here.
>> For example, one might claim that Owen's definition was ambiguous,
>> or that Arthur and Aster should not have made assumptions about the
>> color of Owen's apple if Owen did not state the color in his
>> definition.  Indeed, in this simple example it is easy to see where
>> the conflicting assumptions crept in.  In real life, when you're
>> dealing with thousands or millions of RDF statements, it is usually
>> far more subtle.
>
> True, but that does not change the essentials.
>>
>> One might also assume that color is an intrinsic property of the
>> apple, and hence is somehow different from other properties that
>> one might assert.  Imagine instead that Arthur had stated ":apple a
>> :GoodFruit" and Aster had stated ":apple a :BadFruit" (assuming
>> :GoodFruit owl:disjointWith :BadFruit).  The result would have been
>> the same when Connie attempted to merge their graphs.  Since,
>> AFAIK, there is no objective way to distinguish between intrinsic
>> properties and non-intrinsic properties, the color example should
>> suffice.
>
> You might assert that color is an owl:functionalProperty. That would
> do the trick.

Yes, it may be possible for *some* properties, but that isn't the 
problem.  The problem is to provide an algorithm that, given any 
property p and resource r, determines whether p is an intrinsic property 
of r.  (First we'd have to define what we mean by "intrinsic property"! 
  There have been lots of fruitless discussions on the W3C TAG list 
about what are the "essential characteristics" of a resource.)

>> [ . . . ]
>> I submit that neither Owen nor Arthur nor Aster did anything
>> fundamentally wrong.  Owen was not wrong, because it is
>> fundamentally impossible for Owen to be completely unambiguous
>> about :apple's resource identity.  And Arthur and Aster did nothing
>> fundamentally wrong, because: (a) they simply made statements about
>> :apple
>
> Did they know that these statements were true? They can't both have
> known this.
>
>> ; and (b) AFAICT there is no fundamental difference between
>> statements that constrain a resource's identity and any other
>> statements about that resource.  In RDF semantics, they all simply
>> add constraints to the possible interpretations.
>>
>> The problem is just that Arthur and Aster happened to (unknowingly)
>> make conflicting statements about :apple .  There's no need to cry
>> foul here.
>
> Its not logical-RDF-AWWW foul, but it is bad form to make assertions
> that you are claiming to be true (and other readers might rely on)
> when you don't have a clue if they are true or not. At least one of
> Arthur or Aster must be being this careless.

I think that's an unnecessary value judgement.  In this simple example, 
it is easy to make that judgement.  But In real life, people often make 
the best statements they can, making the most accurate statements that 
they believe true, and they still disagree.  E.g., does God exist or 
does God not exist?   Even in scientific realms where we think we can be 
objective, reasonable intelligent people still disagree.

I don't think there is any point in being judgemental about it, because 
the bottom line is that different sets of RDF are useful to different 
applications.  If an application provides useful value consuming some 
RDF data that may not be 100% correct in the way it models the world, 
that is still A Good Thing.  The example that I usually use is RDF data 
that models the world as flat.  It works fine for car navigation, but 
would be useless for computing rocket trajectories, and it is *simpler* 
than RDF data that models the world in 3D.  Would you tell the publisher 
not to publish that data, because it is incorrect?  I hope not.  Even if 
can suggest good ways by which the publisher can publish data that is 
both simple and accurate, would you discourage someone from publishing 
data before it is known to be perfect?  I hope not.

In the end, if some RDF data is providing useful value at lower cost, 
that is more important than being "correct" -- assuming we could even 
agree on what "correct" means.  It seem to me that the most practical 
way forward is to accept that RDF publishers will have different, 
conflicting perspectives, and learn to deal with them, without getting 
huffy if person A's RDF is inconsistent with person B's RDF.

David

>
>> We just have to learn to live with this possibility.  And one good
>> technique is what Jeremy suggested: keep different perspectives in
>> different graphs, and only join them if you need to.
>
> True, provided you can make sense of what a perspective is.
>
> Pat
>
>>
>> David
>>
>>
>>
>> -- Jim McCusker Programmer Analyst Krauthammer Lab, Pathology
>> Informatics Yale School of Medicine james.mccusker@yale.edu | (203)
>> 785-4436 http://krauthammerlab.med.yale.edu
>>
>> PhD Student Tetherless World Constellation Rensselaer Polytechnic
>> Institute mccusj@cs.rpi.edu http://tw.rpi.edu
>
> ------------------------------------------------------------ IHMC
> (850)434 8903 or (650)494 3973 40 South Alcaniz St.
> (850)202 4416   office Pensacola                            (850)202
> 4440   fax FL 32502                              (850)291 0667
> mobile phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
>
>
>
>
>
>
>
>
>
Received on Monday, 18 March 2013 22:04:02 UTC