sameAs proliferation (was Visualizing LOD Linkage) from Hugh Glaser on 2008-08-06 (public-lod@w3.org from August 2008)

From: Hugh Glaser <hg@ecs.soton.ac.uk>
Date: Wed, 6 Aug 2008 11:15:14 +0100
To: "public-lod@w3.org" <public-lod@w3.org>
Message-ID: <C4BF3842.263EF%hg@ecs.soton.ac.uk>

On 06/08/2008 09:54, "Yves Raimond" <yves.raimond@gmail.com> wrote:

>
>
> Hello!
>
...
>
> I am not sure if I interpret it correctly - do you mean that you could
> link to two URIs which are in fact sameAs in the target dataset?
> Indeed, in that case, the measure would be slightly higher than what
> it should be. However, I would think that it is rarely (if not never)
> the case.
>
...
I think not.
In our set of KBs this (or similar) is already the case, and vice versa.
And it is about to get worse on a world-wide scale.

Consider:
I have URIa and I have done a lot of work to discover that I think that URIb
and URIc (from another source) should be considered the same.
(In fact, one of the things that gives me confidence is that I found this
other KB I had some trust of that made the same assertion.)
Clearly I could just assert URIa owl:sameAs URIb.
But then my knowledge that URIa owl:sameAs URIc becomes fragile; it depends
on my users finding the other KB, on that KB continuing to exist, and that
the other KB does not change its mind.
The only safe thing to do (unless I want to risk losing the knowledge, and
the work I put in to glean it) is assert URIa owl:sameAs URIc myself.
Now the situation you describe has happened.

The inverse is a little more robust. The KB with URIb and URIc had worked
out that URIa is the same. They can just assert URIb owl:sameAs URIa and
trust to the continued knowledge of the sameness of URIb and URIc. But the
really safe thing to do is also assert URIc owl:sameAs URIa, so the
knowledge will be preserved if knowledge of URIb changes or indeed the URI
is removed or somehow deprecated.

Of course this argument can be extended as more URIs are discovered.
This is the nature of using a binary relation in this way, and results in an
O(n squared) graph. You can take architectural steps to reduce it,
introducing canons and things like that, but the fundamental big O problem
is still there.

Best
Hugh

--
Hugh Glaser,  Reader
              Dependable Systems & Software Engineering
              School of Electronics and Computer Science,
              University of Southampton,
              Southampton SO17 1BJ
Work: +44 (0)23 8059 3670, Fax: +44 (0)23 8059 3045
Mobile: +44 (0)78 9422 3822, Home: +44 (0)23 8061 5652
http://www.ecs.soton.ac.uk/~hg/

"If we have a correct theory but merely prate about it, pigeonhole it, and
do not put it into practice, then the theory, however good, is of no
significance."

Received on Wednesday, 6 August 2008 10:16:05 UTC