RE: owl:sameAs - Harmful to provenance?

Dear David,

You wrote:

	  1. Owen's URI definition will always be
ambiguous.  There will always exist a property p
such that neither p nor its negation are entailed
by the URI definition.

While true, this leaves out the subjective part;
Aster might believe, without the addition of a new
property p, that Owen's URI means one thing, while
Albert believes a different interpretation of
Owen's URI from Aster's.  While adding a new
property (which can always be done IMHO) makes it
mathematically clear, I would like to emphasize
that the individual Observer (Aster, Albert,
Algernon, Argentium, or whoever) also makes an
individual interpretation which can be different
an arbitrary other Observer.  

I believe the history of group actions taken on
"standards" shows that the individual is the
source of most divergence in interpretations.  

-Rich

Sincerely,
Rich Cooper
EnglishLogicKernel.com
Rich AT EnglishLogicKernel DOT com
9 4 9 \ 5 2 5 - 5 7 1 2

-----Original Message-----
From: David Booth [mailto:david@dbooth.org] 
Sent: Monday, April 08, 2013 7:07 AM
To: Pat Hayes
Cc: Peter Ansell; Alan Ruttenberg;
public-semweb-lifesci
Subject: Re: owl:sameAs - Harmful to provenance?

Hi Pat,

On 04/04/2013 02:03 AM, Pat Hayes wrote:
>
> On Apr 3, 2013, at 9:00 PM, Peter Ansell wrote:
>
>> On 4 April 2013 11:58, David Booth
<david@dbooth.org> wrote: On
>> 04/02/2013 05:02 PM, Alan Ruttenberg wrote: On
Tuesday, April 2,
>> 2013, David Booth wrote: On 03/27/2013 10:56
PM, Pat Hayes wrote:
>> On Mar 27, 2013, at 7:32 PM, Jim McCusker
wrote:
>>
>> If only owl:sameAs were used correctly...
>>
>> Well, I agree that is a problem, but don't draw
the conclusion
>> that there is something wrong with sameAs, just
because people keep
>> using it wrong.
>>
>> Agreed.  And furthermore, don't draw the
conclusion that someone
>> has used owl:sameAs wrong just because you get
garbage when you
>> merge two graphs that individually worked just
fine.  Those two
>> graphs may have been written assuming different
sets of
>> interpretations.
>>
>> In that case I would certainly conclude that
they have used it
>> wrong. Have you not been reading what Pat and I
have been writing?
>>
>> I've read lots of what you and Pat have
written.  And I've learned
>> a lot from it -- particularly in learning about
ambiguity from Pat.
>> And I'm in full agreement that owl:sameAs is
*often* misused.
>>
>> But I don't believe that getting garbage when
merging two graphs
>> that individually worked fine *necessarily*
indicates that
>> owl:sameAs was misused -- even when it appears
on the surface to be
>> causing the problem.
>
> I agree, but not with your example and your
analysis of it.
>
>> Here's a simple example to illustrate.
>>
>> Using the following prefixes throughout, for
brevity:
>>
>> @prefix :    <http://example/owen/> . @prefix
owl:
>> <http://www.w3.org/2002/07/owl#> .
>>
>> Suppose that Owen is the URI owner of :x, :y
and :z, and Owen
>> defines them as follows:
>>
>> # Owen's URI definition for :x, :y and :z :x a
:Something . :y a
>> :Something . :z a :Something .
>>
>> That's all.  That's Owen's entire definition of
those URIs.
>> Obviously this definition is "ambiguous" in
some sense.  But as we
>> know, ambiguity is ultimately inescapable
anyway, so I have merely
>> chosen an example that makes the ambiguity
obvious. As the RDF
>> Semantics spec puts it: "It is usually
impossible to assert enough
>> in any language to completely constrain the
interpretations to a
>> single possible world".
>
> Yes, but by making the ambiguity this "obvious",
you have rendered
> the example pointless. There is *no* content
here *at all*, so Owen
> has not really published anything. This is not
typical of published
> content, even in RDF. Typically, in fact, there
is, as well as some
> nontrivial actual RDF content, some kind of
explanation, perhaps in
> natural language, of what the *intended* content
of the formal RDF is
> supposed to be. While an RDF engine cannot of
course make use of such
> intuitive explanations, other authors of RDF
can, and should, make
> use of it to try to ensure that they do not make
assertions which
> would be counter to the referential intentions
of the original
> authors. For example, the Dublin Core URIs were
published with almost
> no formal RDF axioms, but quite elaborate
natural language glosses
> which enable them to be used in formal RDF with
considerable success.
> The fact that formal (and even informal) data is
inherently ambiguous
> does not mean that it is inherently, or even
typically, vacuous.

This seems to suggest that natural language can
somehow eliminate 
ambiguity, where formal languages cannot.  I don't
buy that.  Presumably 
whatever definition one expressed in natural
language could be expressed 
in a formal language -- in principle at least.
And certainly the goal 
of the semantic web is to have such information
expressed in a formal 
language that is amenable to machine processing.

More precisely, the basic assumption I am making
is that for (almost) 
any definition there exists a property such that
neither that property 
nor its negation are entailed by the definition.
I.e., there is always 
more than can be said about the thing whose
identity is defined.  Maybe 
that assumption is wrong; I don't know.  If you
think it's wrong, I'd be 
interested in hearing why.

The example may not be "realistic", but it is
*not* pointless.  The 
whole point of choosing such a simple example is
to expose the 
fundamental issues outright, rather than obscuring
them in complexity 
that we cannot fully understand.  If there is some
fundamental reason 
why you think this problem cannot happen in a more
"realistic" example, 
then please explain what mechanism would come into
play to prevent it.

>
>> Arthur, an RDF author, publishes the following
graph, G1, making
>> certain assumptions about the interpretations
that will be applied
>> to it:
>>
>> # G1 :x owl:sameAs :y .
>
> On what basis does Arthur make this assertion?
The URIs were coined
> by Owen, and Owen says nothing that would
sanction this assumption.

Why Arthur or anyone else chooses to assert
whatever they choose to 
assert is their business.  It is irrelevant to
this analysis.

>
>> Aster, another RDF author, publishes the
following graph, G2,
>> making certain other assumptions about the
interpretations that
>> will be applied to it:
>>
>> # G2 :x owl:differentFrom :z .
>>
>> Alfred, a third RDF author, publishes the
following graph, G3,
>> making still other assumptions about the
interpretations that will
>> be applied to it:
>>
>> # G3 :y owl:differentFrom :z .
>
> Similarly for the other two. They are making
assertions using names
> that belong to, and were coined by, another
author without having any
> possible source of justification for these
nontrivial claims. This
> should not be regarded as good practice, to put
it mildly.

Ditto.  If you are claiming that an RDF author
needs some sort of 
"justification" to make assertions, then please
explain exactly what you 
mean -- preferably in formal terms -- by
"justification".  E.g., does 
"justification" mean that Arthur may only make
assertions that are 
entailed by Owen's definition?  I already
discussed that possibility below.

>
>> Note that G1, G2 and G3 are all individually
consistent with Owen's
>> URI definition.  Furthermore, G1, G2 and G3 are
all pair-wise
>> consistent: there exists at least one
satisfying interpretation for
>> the merge of each pair.  But the merge of G1,
G2 and G3 is not
>> consistent:
>
> This kind of behavior is of course quite typical
in any assertional
> language.

Yes.

>
>> Arthur, Aster and Alfred made different
assumptions about the set
>> of interpretations that would be applied to
their graphs, and the
>> intersection of those sets was empty.
>>
>> Did Arthur misuse owl:sameAs?   What if Aster
never published G2?
>> How could Aster's graph possibly affect the
question of whether
>> *Arthur* misused owl:sameAs?  It would be
nonsensical to assume
>> that it could.
>
> Why? Surely if Aster had a more reliable access
to the primary source
> of information about these enigmatic thingies
than Arthur did, then
> it might well be the case that Aster's
publication could reveal
> errors in Arthur's, by contradicting him.

What do you mean by "more reliable"?  Both Arthur
and Aster had access 
to the exact same URI definition from Owen.  Are
you suggesting that 
Arthur and/or Aster should have used a *different*
URI definition?  If 
so, what definition and why?

>
>> What if Owen later said that Arthur was
correct, that :x == :y ?
>> What if he later said the opposite?  Again, it
would seem rather
>> bizarre to say that the determination of
whether Arthur had
>> misused owl:sameAs could be changed -- long
after Arthur had
>> written G1 -- by Owen's later statements.
>
> Again, I don't find this bizarre in the least.
It might be, if there
> was no truth of the matter concerning all this
stuff, so that all
> these assertions were made independently with
equal (or equal lack
> of) authority as to their actual truth. But that
is so implausible
> and artificial an assumption that I don't see
why we need to even
> discuss it.

The RDF Semantics is explicitly agnostic about
interpretations and 
"actual truth".

Owen published a URI definition, and Arthur, Aster
and Alfred published 
a bunch of assertions.  Whether anyone "believes"
any of those 
assertions, whether those assertions have any
bearing on the "real 
world", and whether they are at all useful to
anyone's applications, are 
entirely different questions.  AFAICT those
questions are irrelevant to 
the technical question of whether Arthur "misused"
owl:sameAs.

>
>> One might claim that Arthur misused owl:sameAs
because Owen had not
>> specified whether :x was the same or different
from :y or :z, and
>> therefore Arthur had improperly *guessed* about
the value of :x's
>> owl:sameAs property.
>>
>> But by that logic, Arthur would not be able to
assert *anything*
>> new about :x.  I.e., Arthur would not be
allowed to assert any
>> property whose value was not already entailed
by Owen's
>> definition!
>
> Arthur may add information, of course. But
Arthur is responsible for
> the truth of what he asserts, and part of that
responsibility, in
> practice, is to take care to ascertain what the
intended referents
> are of any URIs published by others, that Arthur
then uses in his
> assertions.

But Arthur, Aster and Alfred were each fully
diligent in ensuring that 
their assertions were consistent with all
information that Owen 
provided.  What more could they do?

> For example, if I (as I recently did) wish to
assert that
> something was red in color, I might use the URI
>
>
http://linkedopencolors.moreways.net/color/rgb/ff0
000.html
>
> rather than, say,
>
>
http://linkedopencolors.moreways.net/color/rgb/00f
f00.html
>
> because I know, using my color vision (not
available to RDF engines)
> that the first one refers to red and the second
one to green, which
> (I also know) is not red. I *could* use the
second URI and insist
> that I intended it to denote the color red, but
that would be stupid,
> since readers of my RDF will (and indeed should)
misunderstand me. If
> I were to assert that
>
>
http://linkedopencolors.moreways.net/color/rgb/00f
f00.html
> owl:sameAs
http://linkedopencolors.moreways.net/color/css/red
.html
> .
>
> then I would be saying something false. And yes,
in that case, it
> *is* my error, even if what I have said is
formally consistent (which
> it in fact is) with the published RDF
"definition" of these URis
> (which is in fact empty.)

In that example there were additional constraints
that were not 
expressed formally -- such as the fact that red
and green are different 
colors, and what wavelengths correspond to which
colors, etc.  But 
unless you are claiming that assertions expressed
in natural language 
can somehow avoid ambiguity where formal
assertions cannot, then for the 
sake of analysis we can assume that all assertions
have been expressed 
formally.

I am also assuming that in the vast majority of
cases, a URI's resource 
identity will be defined by a description, rather
than by ostension
http://plato.stanford.edu/entries/identity/
so I am focusing on that case.

>
>> And that would render RDF rather pointless.
>
> Why would it render it pointless? The point of
RDF is not to make
> completely unjustified statements about nothing
in particular.

RDF is designed to allow anyone to say anything
about anything.  If 
someone chooses to make completely unjustified
statements about nothing 
in particular, that is their business.  AFAICT
that is completely 
irrelevant to the technical question of whether
owl:sameAs was used 
incorrectly.

>
>> Maybe someone can see a way to avoid this
dilemma.  Maybe someone
>> can figure out a way to distinguish between the
"essential"
>> properties that serve to identify a resource,
and other
>> "inessential" properties that the resource
might have. If so, and
>> the number of "essential" properties is finite,
then indeed this
>> problem could be avoided by requiring every URI
owner to define all
>> of the "essential" properties of the URI's
denoted resource, or by
>> prohibiting anyone but the URI owner from
asserting any new
>> "essential" properties of the resource (beyond
those the URI owner
>> had defined).  Or maybe there is another way
around this dilemma.
>
> What do you see the "dilemma" here as being,
exactly? It seems to me
> that this is not about RDF as such at all. It is
about data, however
> that data is recorded. People can publish data
about things. They do
> so by making assertions. In an ideal world,
everyone is responsible
> for the assertions they make. Other people can
put together
> information from various sources, but the
reliability of the result
> is hostage to the reliability of all the sources
that are used. All
> this is kind of obvious, but what else is being
said in this thread?

The dilemma is that we would like each URI to
always denote the same 
thing in all RDF datasets, so that when we merge
RDF datasets, the merge 
will make sense: the merge will be consistent and
an application that 
worked properly on an individual RDF dataset will
also work properly on 
the merge of that dataset with other datasets.
But because URI 
definitions are inherently ambiguous, different
RDF authors will 
interpret them differently, and this leads to
inconsistencies when 
datasets are merged -- even when all parties have
acted in good faith 
and have done all that they could reasonably have
been expected to do to 
avoid such conflicts.

Key assumptions:

  1. Owen's URI definition will always be
ambiguous.  There will always 
exist a property p such that neither p nor its
negation are entailed by 
the URI definition.

  2. Owen cannot be expected to forever refine his
URI definition by 
adding disambiguation at the request of every RDF
author who uses his 
URIs.  At some point, Owen will reach the point of
saying "that's all 
the disambiguation you get".  (This is the point
at which the example 
that I gave begins.)

>
>>
>> Unless some way around this dilemma is found,
it seems unreasonably
>> judgemental to accuse Arthur of misusing
owl:sameAs in this case,
>
> Possibly, yes, but not because...
>
>> since he didn't assert anything that was
inconsistent with Owen's
>> URI definition
>
> Consistency is not the point. If I make
completely unfounded
> assertions about a topic that you have
introduced, then the fact they
> might be logically consistent with what you have
said is neither here
> nor there. What matters is whether I have the
authority to make the
> assertions I do, or whether I am lying,
fabricating or simply
> fantasizing using Owen's vocabulary.

Can you translate that into more objective
technical terms?  What 
exactly does "unfounded" mean?  And what do you
mean by "authority"? 
What objective technical criteria are you
suggesting?  And why is it 
relevant to the question of whether Arthur misused
owl:sameAs, given 
that the RDF Semantics is explicitly agnostic
about interpretations?

David Booth

Received on Monday, 8 April 2013 18:09:01 UTC