Re: owl:sameAs - Harmful to provenance?

David Booth <david@dbooth.org> writes:
> Maybe someone can see a way to avoid this dilemma.  Maybe
> someone can figure out a way to distinguish between the
> "essential" properties that serve to identify a resource, and
> other "inessential" properties that the resource might have.
> If so, and the number of "essential" properties is finite,
> then indeed this problem could be avoided by requiring every
> URI owner to define all of the "essential" properties of the
> URI's denoted resource, or by prohibiting anyone but the URI
> owner from asserting any new "essential" properties of the
> resource (beyond those the URI owner had defined).  Or maybe
> there is another way around this dilemma.
>
> Unless some way around this dilemma is found, it seems
> unreasonably judgemental to accuse Arthur of misusing
> owl:sameAs in this case, since he didn't assert anything
> that was inconsistent with Owen's URI definition.


I think your analysis is good. My solution to avoiding the horns of the
dilemma is to take a different tack entirely, and to think about the
social aspects of how these graphs came to be produced. 

Owen has produced some data. Then Arthur, Aster and Alfred has both
extended it in ways which turn out to be incompatible, and yet they all
seem to be doing things that fulfil their respective use cases. So, in
one sense, there is no problem here at all. In each case, Arthur, Aster
and Alfred get everything to work, and everybody is happy. 

The problem comes when you try to integrate their work; now it breaks.
So, how to avoid this? They are two key ways: the first is to say, well,
okay, so now the graphs break, so lets get together and sort the problem
out. There are lots of ways you could change the graphs here so that the
problem goes away.

The other solution is to argue that if everybody follows a standard
rigidly, then this problem won't happen in the first place. The
difficulty here is not, for example, understanding the set theoretic
interpretation, but how to apply this to what ever it is that you are
trying to model. 

My experience has been that the former has significant costs, that
integrating post-hoc is expensive and time-consuming. However, my
experience of the latter approach, is that it is highly unscalable, and
results in very long and obscure philosophical debates. Essentially,
with the former you pay the cost of integration as you need it; in the
latter you pay the cost of integration all the time, whether you need it
or not. 

So, are the people in your example misuing owl:sameAs? Not if they are
answering the questions they need. Should they fix the problem with
integration? If they need to, to get better answers. But not until then. 


> But by that logic, Arthur would not be able to assert *anything*
> new about :x.  I.e., Arthur would not be allowed to assert
> any property whose value was not already entailed by Owen's
> definition!  And that would render RDF rather pointless.


Absolutely; the whole point of integrating data is that you want to say
things about knowledge that comes from other people. Otherwise, you
don't have integration, you just have a bunch of triples in the same
bucket. 

Phil

Received on Thursday, 4 April 2013 08:53:09 UTC