RE: How do you deprecate URIs? Re: OWL-DL and linked data

Hi Richard,

It sounds like we may be talking about different problems, or perhaps different use cases.  I'll try to clarify what I (and I think Martin) meant in my comments about owl:sameAs.

> From: Richard H. McCullough [mailto:rhm@PioneerCA.com]
>
> I haven't been following the "deprecate URIs" thread, so
> forgive me if I'm being repetitious.
> 1. everything is contextual.  But that's no excuse for being
> sloppy with meanings.

I agree that sloppiness is bad, and did not mean to imply that it should be sanctioned.  But "sloppiness" is also a value judgement that depends on the application -- one person's simplicity is another's sloppiness -- and it's important to have strategies for dealing with it when it does occur.

> 2. ambiguity is not inevitable -- it is avoided by clearly identifying
> context.

It depends what you mean.  If you are talking about determining the real world referent of a statement (step 2 in slides 5-8 of http://dbooth.org/2008/irsw/slides.ppt ), then as Pat Hayes has pointed out several times, completely nailing that down is almost always impossible.  And trying to pin it down by clearly identifying context won't help: that merely begs the question of how to unambiguously identify the context.  See Pat Hayes' and Harry Halpin's paper "In Defense of Ambiguity":
http://www.ibiblio.org/hhalpin/homepage/publications/indefenseofambiguity.html

On the other hand, if you are talking about determining the set of assertions that constrain a statement's meaning within semantic web architecture (step 1 in slides 5-8 of  http://dbooth.org/2008/irsw/slides.ppt ), then I agree that can be unambiguous.

> 2. OWL:SameAs (like mKR:is) means identical -- two names
> (aliases) which
> mean the same thing.  Let's not corrupt the meaning of this term.

Agreed.  We should not change the semantics of owl:sameAs.

> 3. there are other terms which can be used to express varying
> degrees of similarity.

But owl:sameAs is *exactly* the term needed to indicate that both a:a and b:b denote the same resource in a statement like this in File1:

    a:a owl:sameAs b:b .

or that b:b and c:c denote the same resource in a statement like this in File2:

    b:b owl:sameAs c:c .

However, that does *not* mean that an application X wishing to combine the data from File1 and File2 must treat a:a and c:c as denoting the same resource.  Indeed, doing so may cause a logical contradiction.

What's going on?  From the File1 author's perspective, a:a and b:b denoted the same resource, and from the File2 author's perspective, b:b and c:c denoted the same resource, but from X's perspective, they may not.   One may ask, "what resources are a:a, b:b and c:c *supposed* to denote?", but their definitions may well admit multiple interpretations, and multiple interpretations are permitted in the RDF semantics:
http://www.w3.org/TR/rdf-mt/#interp
[[
The basic intuition of model-theoretic semantics is that asserting a sentence makes a claim about the world: it is another way of saying that the world is, in fact, so arranged as to be an interpretation which makes the sentence true. In other words, an assertion amounts to stating a constraint on the possible ways the world might be. Notice that there is no presumption here that any assertion contains enough information to specify a single unique interpretation. It is usually impossible to assert enough in any language to completely constrain the interpretations to a single possible world, so there is no such thing as 'the' unique interpretation of an RDF graph.
]]

One may be tempted to claim that the File1 and File2 authors overstepped their authority in further constraining the permissible interpretations of a:a, b:b and c:c to the extent that they were then able to assert them as owl:sameAs each other.  In other words, it may be tempting to claim that those authors should not have further constrained the interpretations of a:a, b:b and c:c beyond the terms' original definitions.  But the fact is that virtually *every* assertion involving the term, beyond the logical entailments of a term's definition, further constraint the permissible interpretations for that term.

So the problem is not that owl:sameAs has been abused, nor is it that the assertions in File1 or File1 are "wrong".  The problem is that the models of the world embodied by the assertions in File1 and File2 are mutually incompatible: they cannot be used together in application X without some surgery.  And the point of slides 15-17 in
http://dbooth.org/2008/irsw/slides.ppt
is to describe one technique for performing such surgery when it is needed.

So if indeed "It is usually impossible to assert enough in any language to completely constrain the interpretations to a single possible world", as stated in the RDF Semantics, then the logical consequence is that ambiguity is inevitable, so we may as well get used to dealing with is.

On consequence of this is that there is a practical trade-off between reusability and precision: the more precise a term, the more constrained it is, and hence the more "likely" it is to be incompatible with other assertions.  Of course, people do not choose assertions at random, so we cannot really view this as a simple probability of incompatibility, but the trade-off is nonetheless real: all other things being equal, more constraints means less reusability (without requiring surgery, at least).

On the other hand, having too few constraints makes a term useless in a different way, when nobody can figure out what it means.  So defining good, reusable terms is a balancing act: the best terms are those that are constrained enough (and in the right ways) to be useful, but not so tightly as to preclude too many applications.  There is no substitute for good judgement.



David Booth, Ph.D.
HP Software
+1 617 629 8881 office  |  dbooth@hp.com
http://www.hp.com/go/software

Statements made herein represent the views of the author and do not necessarily represent the official views of HP unless explicitly so stated.

Received on Thursday, 10 July 2008 01:30:41 UTC