Re: owl:sameAs - Harmful to provenance? from David Booth on 2013-04-04 (public-semweb-lifesci@w3.org from April 2013)

From: David Booth <david@dbooth.org>
Date: Wed, 03 Apr 2013 21:58:44 -0400
To: Alan Ruttenberg <alanruttenberg@gmail.com>
CC: Pat Hayes <phayes@ihmc.us>, Jim McCusker <mccusj@rpi.edu>, public-semweb-lifesci <public-semweb-lifesci@w3.org>
Message-ID: <515CDE54.50808@dbooth.org>
On 04/02/2013 05:02 PM, Alan Ruttenberg wrote:
> On Tuesday, April 2, 2013, David Booth wrote:
>     On 03/27/2013 10:56 PM, Pat Hayes wrote:
>         On Mar 27, 2013, at 7:32 PM, Jim McCusker wrote:
>
>             If only owl:sameAs were used correctly...
>
>         Well, I agree that is a problem, but don't draw the conclusion that
>         there is something wrong with sameAs, just because people keep using
>         it wrong.
>
>     Agreed.  And furthermore, don't draw the conclusion that someone has
>     used owl:sameAs wrong just because you get garbage when you merge
>     two graphs that individually worked just fine.  Those two graphs may
>     have been written assuming different sets of interpretations.
>
> In that case I would certainly conclude that they have used it wrong.
> Have you not been reading what Pat and I have been writing?

I've read lots of what you and Pat have written.  And I've learned a lot 
from it -- particularly in learning about ambiguity from Pat.  And I'm 
in full agreement that owl:sameAs is *often* misused.

But I don't believe that getting garbage when merging two graphs that 
individually worked fine *necessarily* indicates that owl:sameAs was 
misused -- even when it appears on the surface to be causing the
problem.  Here's a simple example to illustrate.

Using the following prefixes throughout, for brevity:

   @prefix :    <http://example/owen/> .
   @prefix owl: <http://www.w3.org/2002/07/owl#> .

Suppose that Owen is the URI owner of :x, :y and :z, and Owen
defines them as follows:

   # Owen's URI definition for :x, :y and :z
   :x a :Something .
   :y a :Something .
   :z a :Something .

That's all.  That's Owen's entire definition of those URIs.
Obviously this definition is "ambiguous" in some sense.  But as
we know, ambiguity is ultimately inescapable anyway, so I have
merely chosen an example that makes the ambiguity obvious.
As the RDF Semantics spec puts it: "It is usually impossible
to assert enough in any language to completely constrain the
interpretations to a single possible world".

Arthur, an RDF author, publishes the following graph, G1,
making certain assumptions about the interpretations that will
be applied to it:

   # G1
   :x owl:sameAs :y .

Aster, another RDF author, publishes the following graph, G2,
making certain other assumptions about the interpretations
that will be applied to it:

   # G2
   :x owl:differentFrom :z .

Alfred, a third RDF author, publishes the following graph, G3,
making still other assumptions about the interpretations that
will be applied to it:

   # G3
   :y owl:differentFrom :z .

Note that G1, G2 and G3 are all individually consistent with
Owen's URI definition.  Furthermore, G1, G2 and G3 are all
pair-wise consistent: there exists at least one satisfying
interpretation for the merge of each pair.  But the merge
of G1, G2 and G3 is not consistent: Arthur, Aster and Alfred
made different assumptions about the set of interpretations
that would be applied to their graphs, and the intersection
of those sets was empty.

Did Arthur misuse owl:sameAs?   What if Aster never
published G2?  How could Aster's graph possibly affect the
question of whether *Arthur* misused owl:sameAs?  It would
be nonsensical to assume that it could.  What if Owen later
said that Arthur was correct, that :x == :y ?  What if he
later said the opposite?  Again, it would seem rather bizarre
to say that the determination of whether Arthur had misused
owl:sameAs could be changed -- long after Arthur had written
G1 -- by Owen's later statements.

One might claim that Arthur misused owl:sameAs because Owen
had not specified whether :x was the same or different from
:y or :z, and therefore Arthur had improperly *guessed* about
the value of :x's owl:sameAs property.

But by that logic, Arthur would not be able to assert *anything*
new about :x.  I.e., Arthur would not be allowed to assert
any property whose value was not already entailed by Owen's
definition!  And that would render RDF rather pointless.

Maybe someone can see a way to avoid this dilemma.  Maybe
someone can figure out a way to distinguish between the
"essential" properties that serve to identify a resource, and
other "inessential" properties that the resource might have.
If so, and the number of "essential" properties is finite,
then indeed this problem could be avoided by requiring every
URI owner to define all of the "essential" properties of the
URI's denoted resource, or by prohibiting anyone but the URI
owner from asserting any new "essential" properties of the
resource (beyond those the URI owner had defined).  Or maybe
there is another way around this dilemma.

Unless some way around this dilemma is found, it seems
unreasonably judgemental to accuse Arthur of misusing
owl:sameAs in this case, since he didn't assert anything
that was inconsistent with Owen's URI definition.

David
Received on Thursday, 4 April 2013 01:59:17 UTC