Re: owl:sameAs - Harmful to provenance? from Pat Hayes on 2013-04-04 (public-semweb-lifesci@w3.org from April 2013)

From: Pat Hayes <phayes@ihmc.us>
Date: Wed, 3 Apr 2013 23:03:50 -0700
To: Peter Ansell <ansell.peter@gmail.com>, David Booth <david@dbooth.org>
Cc: Alan Ruttenberg <alanruttenberg@gmail.com>, Jim McCusker <mccusj@rpi.edu>, public-semweb-lifesci <public-semweb-lifesci@w3.org>
Message-Id: <6D108556-D98E-48B3-9629-BEEF29444F3C@ihmc.us>
On Apr 3, 2013, at 9:00 PM, Peter Ansell wrote:

> On 4 April 2013 11:58, David Booth <david@dbooth.org> wrote:
> On 04/02/2013 05:02 PM, Alan Ruttenberg wrote:
> On Tuesday, April 2, 2013, David Booth wrote:
>     On 03/27/2013 10:56 PM, Pat Hayes wrote:
>         On Mar 27, 2013, at 7:32 PM, Jim McCusker wrote:
> 
>             If only owl:sameAs were used correctly...
> 
>         Well, I agree that is a problem, but don't draw the conclusion that
>         there is something wrong with sameAs, just because people keep using
>         it wrong.
> 
>     Agreed.  And furthermore, don't draw the conclusion that someone has
>     used owl:sameAs wrong just because you get garbage when you merge
>     two graphs that individually worked just fine.  Those two graphs may
>     have been written assuming different sets of interpretations.
> 
> In that case I would certainly conclude that they have used it wrong.
> Have you not been reading what Pat and I have been writing?
> 
> I've read lots of what you and Pat have written.  And I've learned a lot from it -- particularly in learning about ambiguity from Pat.  And I'm in full agreement that owl:sameAs is *often* misused.
> 
> But I don't believe that getting garbage when merging two graphs that individually worked fine *necessarily* indicates that owl:sameAs was misused -- even when it appears on the surface to be causing the
> problem.

I agree, but not with your example and your analysis of it. 

>  Here's a simple example to illustrate.
> 
> Using the following prefixes throughout, for brevity:
> 
>   @prefix :    <http://example/owen/> .
>   @prefix owl: <http://www.w3.org/2002/07/owl#> .
> 
> Suppose that Owen is the URI owner of :x, :y and :z, and Owen
> defines them as follows:
> 
>   # Owen's URI definition for :x, :y and :z
>   :x a :Something .
>   :y a :Something .
>   :z a :Something .
> 
> That's all.  That's Owen's entire definition of those URIs.
> Obviously this definition is "ambiguous" in some sense.  But as
> we know, ambiguity is ultimately inescapable anyway, so I have
> merely chosen an example that makes the ambiguity obvious.
> As the RDF Semantics spec puts it: "It is usually impossible
> to assert enough in any language to completely constrain the
> interpretations to a single possible world".

Yes, but by making the ambiguity this "obvious", you have rendered the example pointless. There is *no* content here *at all*, so Owen has not really published anything. This is not typical of published content, even in RDF. Typically, in fact, there is, as well as some nontrivial actual RDF content, some kind of explanation, perhaps in natural language, of what the *intended* content of the formal RDF is supposed to be. While an RDF engine cannot of course make use of such intuitive explanations, other authors of RDF can, and should, make use of it to try to ensure that they do not make assertions which would be counter to the referential intentions of the original authors. For example, the Dublin Core URIs were published with almost no formal RDF axioms, but quite elaborate natural language glosses which enable them to be used in formal RDF with considerable success. The fact that formal (and even informal) data is inherently ambiguous does not mean that it is inherently, or even typically, vacuous. 

> Arthur, an RDF author, publishes the following graph, G1,
> making certain assumptions about the interpretations that will
> be applied to it:
> 
>   # G1
>   :x owl:sameAs :y .

On what basis does Arthur make this assertion? The URIs were coined by Owen, and Owen says nothing that would sanction this assumption.  

> Aster, another RDF author, publishes the following graph, G2,
> making certain other assumptions about the interpretations
> that will be applied to it:
> 
>   # G2
>   :x owl:differentFrom :z .
> 
> Alfred, a third RDF author, publishes the following graph, G3,
> making still other assumptions about the interpretations that
> will be applied to it:
> 
>   # G3
>   :y owl:differentFrom :z .

Similarly for the other two. They are making assertions using names that belong to, and were coined by, another author without having any possible source of justification for these nontrivial claims. This should not be regarded as good practice, to put it mildly. 

> Note that G1, G2 and G3 are all individually consistent with
> Owen's URI definition.  Furthermore, G1, G2 and G3 are all
> pair-wise consistent: there exists at least one satisfying
> interpretation for the merge of each pair.  But the merge
> of G1, G2 and G3 is not consistent:

This kind of behavior is of course quite typical in any assertional language. 

> Arthur, Aster and Alfred
> made different assumptions about the set of interpretations
> that would be applied to their graphs, and the intersection
> of those sets was empty.
> 
> Did Arthur misuse owl:sameAs?   What if Aster never
> published G2?  How could Aster's graph possibly affect the
> question of whether *Arthur* misused owl:sameAs?  It would
> be nonsensical to assume that it could.

Why? Surely if Aster had a more reliable access to the primary source of information about these enigmatic thingies than Arthur did, then it might well be the case that Aster's publication could reveal errors in Arthur's, by contradicting him. 

>  What if Owen later
> said that Arthur was correct, that :x == :y ?  What if he
> later said the opposite?  Again, it would seem rather bizarre
> to say that the determination of whether Arthur had misused
> owl:sameAs could be changed -- long after Arthur had written
> G1 -- by Owen's later statements.

Again, I don't find this bizarre in the least. It might be, if there was no truth of the matter concering all this stuff, so that all these assertions were made independently with equal (or equal lack of) authority as to their actual truth. But that is so implausible and artificial an assumption that I don't see why we need to even discuss it. 

> One might claim that Arthur misused owl:sameAs because Owen
> had not specified whether :x was the same or different from
> :y or :z, and therefore Arthur had improperly *guessed* about
> the value of :x's owl:sameAs property.
> 
> But by that logic, Arthur would not be able to assert *anything*
> new about :x.  I.e., Arthur would not be allowed to assert
> any property whose value was not already entailed by Owen's
> definition!  

Arthur may add information, of course. But Arthur is responsible for the truth of what he asserts, and part of that responsibility, in practice, is to take care to ascertain what the intended referents are of any URIs published by others, that Arthur then uses in his assertions. For example, if I (as I recently did) wish to assert that something was red in color, I might use the URI 

http://linkedopencolors.moreways.net/color/rgb/ff0000.html

rather than, say, 

http://linkedopencolors.moreways.net/color/rgb/00ff00.html

because I know, using my color vision (not available to RDF engines) that the first one refers to red and the second one to green, which (I also know) is not red. I *could* use the second URI and insist that I intended it to denote the color red, but that would be stupid, since readers of my RDF will (and indeed should) misunderstand me. If I were to assert that 

http://linkedopencolors.moreways.net/color/rgb/00ff00.html 
owl:sameAs 
http://linkedopencolors.moreways.net/color/css/red.html  .

then I would be saying something false. And yes, in that case, it *is* my error, even if what I have said is formally consistent (which it in fact is) with the published RDF "definition" of these URis (which is in fact empty.)

> And that would render RDF rather pointless.

Why would it render it pointless? The point of RDF is not to make completely unjustified statements about nothing in particular. 

> Maybe someone can see a way to avoid this dilemma.  Maybe
> someone can figure out a way to distinguish between the
> "essential" properties that serve to identify a resource, and
> other "inessential" properties that the resource might have.
> If so, and the number of "essential" properties is finite,
> then indeed this problem could be avoided by requiring every
> URI owner to define all of the "essential" properties of the
> URI's denoted resource, or by prohibiting anyone but the URI
> owner from asserting any new "essential" properties of the
> resource (beyond those the URI owner had defined).  Or maybe
> there is another way around this dilemma.

What do you see the "dilemma" here as being, exactly? It seems to me that this is not about RDF as such at all. It is about data, however that data is recorded. People can publish data about things. They do so by making assertions. In an ideal world, everyone is responsible for the assertions they make. Other people can put together information from various sources, but the reliability of the result is hostage to the reliability of all the sources that are used. All this is kind of obvious, but what else is being said in this thread?

> 
> Unless some way around this dilemma is found, it seems
> unreasonably judgemental to accuse Arthur of misusing
> owl:sameAs in this case,

Possibly, yes, but not because...

> since he didn't assert anything
> that was inconsistent with Owen's URI definition

Consistency is not the point. If I make completely unfounded assertions about a topic that you have introduced, then the fact they might be logically consistent with what you have said is neither here nor there. What matters is whether I have the authority to make the assertions I do, or whether I am lying, fabricating or simply fantasizing using Owen's vocabulary.

Pat

> 
> +1
> 
> This is a great explanation of the issue.
> 
> Peter
> 

------------------------------------------------------------
IHMC                                     (850)434 8903 or (650)494 3973   
40 South Alcaniz St.           (850)202 4416   office
Pensacola                            (850)202 4440   fax
FL 32502                              (850)291 0667   mobile
phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
Received on Thursday, 4 April 2013 06:04:26 UTC