RE: owl:sameAs - Harmful to provenance? from Michael Miller on 2013-04-08 (public-semweb-lifesci@w3.org from April 2013)

From: Michael Miller <Michael.Miller@systemsbiology.org>
Date: Mon, 8 Apr 2013 11:06:28 -0700
To: Phillip Lord <phillip.lord@newcastle.ac.uk>, Oliver Ruebenacker <curoli@gmail.com>
Cc: David Booth <david@dbooth.org>, Pat Hayes <phayes@ihmc.us>, Peter Ansell <ansell.peter@gmail.com>, Alan Ruttenberg <alanruttenberg@gmail.com>, public-semweb-lifesci <public-semweb-lifesci@w3.org>
Message-ID: <61c1c9e51d5ddaa4db25046d2cb6212f@mail.gmail.com>
hi all,

phillip,  not to mention a name (like mine!) is not particularly unique.

cheers,
michael

Michael Miller
Software Engineer
Institute for Systems Biology

> -----Original Message-----
> From: Phillip Lord [mailto:phillip.lord@newcastle.ac.uk]
> Sent: Monday, April 08, 2013 9:53 AM
> To: Oliver Ruebenacker
> Cc: David Booth; Pat Hayes; Peter Ansell; Alan Ruttenberg;
public-semweb-
> lifesci
> Subject: Re: owl:sameAs - Harmful to provenance?
>
>
> And it is this bit -- "before we can do anything useful" that is utterly
> wrong.
>
> Recently I have spent a lot of time look at Dublin Core creator fields.
> You could not believe how many different ways they are used. String
> literals ("Phillip Lord"), last-first ("Lord, Phillip"), with abbrevs
> ("P. Lord"), multi-author ("Phillip Lord; Lindsay Marshall"), with
> titles ("Dr Phillip Lord") and so on.
>
> So, is everyone using Dublin Core wrong? It is useless till everyone
> uses it the same way? Emphatically no, it is not useless.
>
> Would it better if everybody did use it the same way? The answer is
> probably not. Names are incredibly complex, and representing them is, in
> turn, difficult and hard. Any specificiation which did full justice to
> all the different name forms in existance would be incredibly
> long-winded. Many people using the specification would get it wrong; or
> you could have a mechanism for ensuring people always used it correctly.
> Then I am sure that both people who ended up using this form of spec
> would have great fun integrating their tiny datasets.
>
> In the example, we have a number of sets of assertions which
> individually fulfil their creators use-cases. Then, when they are bought
> together, the assertions become inconsistent, telling you up front that
> there is work to be done. And you ask in what way is this useful?
>
> Perfection is the enemy of Good.
>
>
>
> Oliver Ruebenacker <curoli@gmail.com> writes:
> >   So what most people here are saying is that before we can do
anything
> > useful, we need to make sure that if two assertions use the same
> reference,
> > they mean the same thing.
> >
> >   To which you respond that you will accept assertions without
assuming
> > that same references mean same things. You will just keep them
separate.
> > There is no rule against that.
> >
> >   But in what way is this useful?
> >
> >      Take care
> >      Oliver
> >
> > On Mon, Apr 8, 2013 at 10:07 AM, David Booth <david@dbooth.org> wrote:
> >
> >> Hi Pat,
> >>
> >>
> >> On 04/04/2013 02:03 AM, Pat Hayes wrote:
> >>
> >>>
> >>> On Apr 3, 2013, at 9:00 PM, Peter Ansell wrote:
> >>>
> >>>  On 4 April 2013 11:58, David Booth <david@dbooth.org> wrote: On
> >>>> 04/02/2013 05:02 PM, Alan Ruttenberg wrote: On Tuesday, April 2,
> >>>> 2013, David Booth wrote: On 03/27/2013 10:56 PM, Pat Hayes wrote:
> >>>> On Mar 27, 2013, at 7:32 PM, Jim McCusker wrote:
> >>>>
> >>>> If only owl:sameAs were used correctly...
> >>>>
> >>>> Well, I agree that is a problem, but don't draw the conclusion
> >>>> that there is something wrong with sameAs, just because people keep
> >>>> using it wrong.
> >>>>
> >>>> Agreed.  And furthermore, don't draw the conclusion that someone
> >>>> has used owl:sameAs wrong just because you get garbage when you
> >>>> merge two graphs that individually worked just fine.  Those two
> >>>> graphs may have been written assuming different sets of
> >>>> interpretations.
> >>>>
> >>>> In that case I would certainly conclude that they have used it
> >>>> wrong. Have you not been reading what Pat and I have been writing?
> >>>>
> >>>> I've read lots of what you and Pat have written.  And I've learned
> >>>> a lot from it -- particularly in learning about ambiguity from Pat.
> >>>> And I'm in full agreement that owl:sameAs is *often* misused.
> >>>>
> >>>> But I don't believe that getting garbage when merging two graphs
> >>>> that individually worked fine *necessarily* indicates that
> >>>> owl:sameAs was misused -- even when it appears on the surface to be
> >>>> causing the problem.
> >>>>
> >>>
> >>> I agree, but not with your example and your analysis of it.
> >>>
> >>>  Here's a simple example to illustrate.
> >>>>
> >>>> Using the following prefixes throughout, for brevity:
> >>>>
> >>>> @prefix :    <http://example/owen/> . @prefix owl:
> >>>> <http://www.w3.org/2002/07/**owl#
> <http://www.w3.org/2002/07/owl#>> .
> >>>>
> >>>> Suppose that Owen is the URI owner of :x, :y and :z, and Owen
> >>>> defines them as follows:
> >>>>
> >>>> # Owen's URI definition for :x, :y and :z :x a :Something . :y a
> >>>> :Something . :z a :Something .
> >>>>
> >>>> That's all.  That's Owen's entire definition of those URIs.
> >>>> Obviously this definition is "ambiguous" in some sense.  But as we
> >>>> know, ambiguity is ultimately inescapable anyway, so I have merely
> >>>> chosen an example that makes the ambiguity obvious. As the RDF
> >>>> Semantics spec puts it: "It is usually impossible to assert enough
> >>>> in any language to completely constrain the interpretations to a
> >>>> single possible world".
> >>>>
> >>>
> >>> Yes, but by making the ambiguity this "obvious", you have rendered
> >>> the example pointless. There is *no* content here *at all*, so Owen
> >>> has not really published anything. This is not typical of published
> >>> content, even in RDF. Typically, in fact, there is, as well as some
> >>> nontrivial actual RDF content, some kind of explanation, perhaps in
> >>> natural language, of what the *intended* content of the formal RDF
is
> >>> supposed to be. While an RDF engine cannot of course make use of
such
> >>> intuitive explanations, other authors of RDF can, and should, make
> >>> use of it to try to ensure that they do not make assertions which
> >>> would be counter to the referential intentions of the original
> >>> authors. For example, the Dublin Core URIs were published with
almost
> >>> no formal RDF axioms, but quite elaborate natural language glosses
> >>> which enable them to be used in formal RDF with considerable
success.
> >>> The fact that formal (and even informal) data is inherently
ambiguous
> >>> does not mean that it is inherently, or even typically, vacuous.
> >>>
> >>
> >> This seems to suggest that natural language can somehow eliminate
> >> ambiguity, where formal languages cannot.  I don't buy that.
Presumably
> >> whatever definition one expressed in natural language could be
> expressed in
> >> a formal language -- in principle at least.  And certainly the goal
of the
> >> semantic web is to have such information expressed in a formal
language
> >> that is amenable to machine processing.
> >>
> >> More precisely, the basic assumption I am making is that for (almost)
any
> >> definition there exists a property such that neither that property
nor its
> >> negation are entailed by the definition.  I.e., there is always more
than
> >> can be said about the thing whose identity is defined.  Maybe that
> >> assumption is wrong; I don't know.  If you think it's wrong, I'd be
> >> interested in hearing why.
> >>
> >> The example may not be "realistic", but it is *not* pointless.  The
whole
> >> point of choosing such a simple example is to expose the fundamental
> issues
> >> outright, rather than obscuring them in complexity that we cannot
fully
> >> understand.  If there is some fundamental reason why you think this
> problem
> >> cannot happen in a more "realistic" example, then please explain what
> >> mechanism would come into play to prevent it.
> >>
> >>
> >>
> >>>  Arthur, an RDF author, publishes the following graph, G1, making
> >>>> certain assumptions about the interpretations that will be applied
> >>>> to it:
> >>>>
> >>>> # G1 :x owl:sameAs :y .
> >>>>
> >>>
> >>> On what basis does Arthur make this assertion? The URIs were coined
> >>> by Owen, and Owen says nothing that would sanction this assumption.
> >>>
> >>
> >> Why Arthur or anyone else chooses to assert whatever they choose to
> assert
> >> is their business.  It is irrelevant to this analysis.
> >>
> >>
> >>
> >>>  Aster, another RDF author, publishes the following graph, G2,
> >>>> making certain other assumptions about the interpretations that
> >>>> will be applied to it:
> >>>>
> >>>> # G2 :x owl:differentFrom :z .
> >>>>
> >>>> Alfred, a third RDF author, publishes the following graph, G3,
> >>>> making still other assumptions about the interpretations that will
> >>>> be applied to it:
> >>>>
> >>>> # G3 :y owl:differentFrom :z .
> >>>>
> >>>
> >>> Similarly for the other two. They are making assertions using names
> >>> that belong to, and were coined by, another author without having
any
> >>> possible source of justification for these nontrivial claims. This
> >>> should not be regarded as good practice, to put it mildly.
> >>>
> >>
> >> Ditto.  If you are claiming that an RDF author needs some sort of
> >> "justification" to make assertions, then please explain exactly what
you
> >> mean -- preferably in formal terms -- by "justification".  E.g., does
> >> "justification" mean that Arthur may only make assertions that are
> entailed
> >> by Owen's definition?  I already discussed that possibility below.
> >>
> >>
> >>
> >>>  Note that G1, G2 and G3 are all individually consistent with Owen's
> >>>> URI definition.  Furthermore, G1, G2 and G3 are all pair-wise
> >>>> consistent: there exists at least one satisfying interpretation for
> >>>> the merge of each pair.  But the merge of G1, G2 and G3 is not
> >>>> consistent:
> >>>>
> >>>
> >>> This kind of behavior is of course quite typical in any assertional
> >>> language.
> >>>
> >>
> >> Yes.
> >>
> >>
> >>
> >>>  Arthur, Aster and Alfred made different assumptions about the set
> >>>> of interpretations that would be applied to their graphs, and the
> >>>> intersection of those sets was empty.
> >>>>
> >>>> Did Arthur misuse owl:sameAs?   What if Aster never published G2?
> >>>> How could Aster's graph possibly affect the question of whether
> >>>> *Arthur* misused owl:sameAs?  It would be nonsensical to assume
> >>>> that it could.
> >>>>
> >>>
> >>> Why? Surely if Aster had a more reliable access to the primary
source
> >>> of information about these enigmatic thingies than Arthur did, then
> >>> it might well be the case that Aster's publication could reveal
> >>> errors in Arthur's, by contradicting him.
> >>>
> >>
> >> What do you mean by "more reliable"?  Both Arthur and Aster had
access
> to
> >> the exact same URI definition from Owen.  Are you suggesting that
Arthur
> >> and/or Aster should have used a *different* URI definition?  If so,
what
> >> definition and why?
> >>
> >>
> >>>  What if Owen later said that Arthur was correct, that :x == :y ?
> >>>> What if he later said the opposite?  Again, it would seem rather
> >>>> bizarre to say that the determination of whether Arthur had
> >>>> misused owl:sameAs could be changed -- long after Arthur had
> >>>> written G1 -- by Owen's later statements.
> >>>>
> >>>
> >>> Again, I don't find this bizarre in the least. It might be, if there
> >>> was no truth of the matter concerning all this stuff, so that all
> >>>
> >>> these assertions were made independently with equal (or equal lack
> >>> of) authority as to their actual truth. But that is so implausible
> >>> and artificial an assumption that I don't see why we need to even
> >>> discuss it.
> >>>
> >>
> >> The RDF Semantics is explicitly agnostic about interpretations and
"actual
> >> truth".
> >>
> >> Owen published a URI definition, and Arthur, Aster and Alfred
published a
> >> bunch of assertions.  Whether anyone "believes" any of those
assertions,
> >> whether those assertions have any bearing on the "real world", and
> whether
> >> they are at all useful to anyone's applications, are entirely
different
> >> questions.  AFAICT those questions are irrelevant to the technical
> question
> >> of whether Arthur "misused" owl:sameAs.
> >>
> >>
> >>
> >>>  One might claim that Arthur misused owl:sameAs because Owen had
> not
> >>>> specified whether :x was the same or different from :y or :z, and
> >>>> therefore Arthur had improperly *guessed* about the value of :x's
> >>>> owl:sameAs property.
> >>>>
> >>>> But by that logic, Arthur would not be able to assert *anything*
> >>>> new about :x.  I.e., Arthur would not be allowed to assert any
> >>>> property whose value was not already entailed by Owen's
> >>>> definition!
> >>>>
> >>>
> >>> Arthur may add information, of course. But Arthur is responsible for
> >>> the truth of what he asserts, and part of that responsibility, in
> >>> practice, is to take care to ascertain what the intended referents
> >>> are of any URIs published by others, that Arthur then uses in his
> >>> assertions.
> >>>
> >>
> >> But Arthur, Aster and Alfred were each fully diligent in ensuring
that
> >> their assertions were consistent with all information that Owen
provided.
> >>  What more could they do?
> >>
> >>
> >>  For example, if I (as I recently did) wish to assert that
> >>> something was red in color, I might use the URI
> >>>
> >>>
> http://linkedopencolors.**moreways.net/color/rgb/ff0000.**html<http://li
> nkedopencolors.moreways.net/color/rgb/ff0000.html>
> >>>
> >>> rather than, say,
> >>>
> >>>
> http://linkedopencolors.**moreways.net/color/rgb/00ff00.**html<http://li
> nkedopencolors.moreways.net/color/rgb/00ff00.html>
> >>>
> >>> because I know, using my color vision (not available to RDF engines)
> >>> that the first one refers to red and the second one to green, which
> >>> (I also know) is not red. I *could* use the second URI and insist
> >>> that I intended it to denote the color red, but that would be
stupid,
> >>> since readers of my RDF will (and indeed should) misunderstand me.
If
> >>> I were to assert that
> >>>
> >>>
> http://linkedopencolors.**moreways.net/color/rgb/00ff00.**html<http://li
> nkedopencolors.moreways.net/color/rgb/00ff00.html>
> >>> owl:sameAs
> >>>
> http://linkedopencolors.**moreways.net/color/css/red.**html<http://linke
> dopencolors.moreways.net/color/css/red.html>
> >>> .
> >>>
> >>> then I would be saying something false. And yes, in that case, it
> >>> *is* my error, even if what I have said is formally consistent
(which
> >>> it in fact is) with the published RDF "definition" of these URis
> >>> (which is in fact empty.)
> >>>
> >>
> >> In that example there were additional constraints that were not
> expressed
> >> formally -- such as the fact that red and green are different colors,
and
> >> what wavelengths correspond to which colors, etc.  But unless you are
> >> claiming that assertions expressed in natural language can somehow
avoid
> >> ambiguity where formal assertions cannot, then for the sake of
analysis
> we
> >> can assume that all assertions have been expressed formally.
> >>
> >> I am also assuming that in the vast majority of cases, a URI's
resource
> >> identity will be defined by a description, rather than by ostension
> >>
>
http://plato.stanford.edu/**entries/identity/<http://plato.stanford.edu/en
> tries/identity/>
> >> so I am focusing on that case.
> >>
> >>
> >>
> >>>  And that would render RDF rather pointless.
> >>>>
> >>>
> >>> Why would it render it pointless? The point of RDF is not to make
> >>> completely unjustified statements about nothing in particular.
> >>>
> >>
> >> RDF is designed to allow anyone to say anything about anything.  If
> >> someone chooses to make completely unjustified statements about
> nothing in
> >> particular, that is their business.  AFAICT that is completely
irrelevant
> >> to the technical question of whether owl:sameAs was used incorrectly.
> >>
> >>
> >>
> >>>  Maybe someone can see a way to avoid this dilemma.  Maybe someone
> >>>> can figure out a way to distinguish between the "essential"
> >>>> properties that serve to identify a resource, and other
> >>>> "inessential" properties that the resource might have. If so, and
> >>>> the number of "essential" properties is finite, then indeed this
> >>>> problem could be avoided by requiring every URI owner to define all
> >>>> of the "essential" properties of the URI's denoted resource, or by
> >>>> prohibiting anyone but the URI owner from asserting any new
> >>>> "essential" properties of the resource (beyond those the URI owner
> >>>> had defined).  Or maybe there is another way around this dilemma.
> >>>>
> >>>
> >>> What do you see the "dilemma" here as being, exactly? It seems to me
> >>> that this is not about RDF as such at all. It is about data, however
> >>> that data is recorded. People can publish data about things. They do
> >>> so by making assertions. In an ideal world, everyone is responsible
> >>> for the assertions they make. Other people can put together
> >>> information from various sources, but the reliability of the result
> >>> is hostage to the reliability of all the sources that are used. All
> >>> this is kind of obvious, but what else is being said in this thread?
> >>>
> >>
> >> The dilemma is that we would like each URI to always denote the same
> thing
> >> in all RDF datasets, so that when we merge RDF datasets, the merge
will
> >> make sense: the merge will be consistent and an application that
worked
> >> properly on an individual RDF dataset will also work properly on the
merge
> >> of that dataset with other datasets.  But because URI definitions are
> >> inherently ambiguous, different RDF authors will interpret them
> >> differently, and this leads to inconsistencies when datasets are
merged --
> >> even when all parties have acted in good faith and have done all that
they
> >> could reasonably have been expected to do to avoid such conflicts.
> >>
> >> Key assumptions:
> >>
> >>  1. Owen's URI definition will always be ambiguous.  There will
always
> >> exist a property p such that neither p nor its negation are entailed
by the
> >> URI definition.
> >>
> >>  2. Owen cannot be expected to forever refine his URI definition by
> adding
> >> disambiguation at the request of every RDF author who uses his URIs.
At
> >> some point, Owen will reach the point of saying "that's all the
> >> disambiguation you get".  (This is the point at which the example
that I
> >> gave begins.)
> >>
> >>
> >>
> >>>
> >>>> Unless some way around this dilemma is found, it seems unreasonably
> >>>> judgemental to accuse Arthur of misusing owl:sameAs in this case,
> >>>>
> >>>
> >>> Possibly, yes, but not because...
> >>>
> >>>  since he didn't assert anything that was inconsistent with Owen's
> >>>> URI definition
> >>>>
> >>>
> >>> Consistency is not the point. If I make completely unfounded
> >>> assertions about a topic that you have introduced, then the fact
they
> >>> might be logically consistent with what you have said is neither
here
> >>> nor there. What matters is whether I have the authority to make the
> >>> assertions I do, or whether I am lying, fabricating or simply
> >>> fantasizing using Owen's vocabulary.
> >>>
> >>
> >> Can you translate that into more objective technical terms?  What
exactly
> >> does "unfounded" mean?  And what do you mean by "authority"? What
> objective
> >> technical criteria are you suggesting?  And why is it relevant to the
> >> question of whether Arthur misused owl:sameAs, given that the RDF
> Semantics
> >> is explicitly agnostic about interpretations?
> >>
> >> David Booth
> >>
> >>
>
> --
> Phillip Lord,                           Phone: +44 (0) 191 222 7827
> Lecturer in Bioinformatics,             Email:
phillip.lord@newcastle.ac.uk
> School of Computing Science,
> http://homepages.cs.ncl.ac.uk/phillip.lord
> Room 914 Claremont Tower,               skype: russet_apples
> Newcastle University,                   twitter: phillord
> NE1 7RU
Received on Monday, 8 April 2013 18:06:54 UTC