Re: sameAs proliferation (was Visualizing LOD Linkage) from Yves Raimond on 2008-08-06 (public-lod@w3.org from August 2008)

From: Yves Raimond <yves.raimond@gmail.com>
Date: Wed, 6 Aug 2008 11:34:46 +0100
To: "Hugh Glaser" <hg@ecs.soton.ac.uk>
Cc: "public-lod@w3.org" <public-lod@w3.org>
Message-ID: <82593ac00808060334q5ed4d4dm3028452ad97a051@mail.gmail.com>

On Wed, Aug 6, 2008 at 11:15 AM, Hugh Glaser <hg@ecs.soton.ac.uk> wrote:
>
> On 06/08/2008 09:54, "Yves Raimond" <yves.raimond@gmail.com> wrote:
>
>>
>>
>> Hello!
>>
> ...
>>
>> I am not sure if I interpret it correctly - do you mean that you could
>> link to two URIs which are in fact sameAs in the target dataset?
>> Indeed, in that case, the measure would be slightly higher than what
>> it should be. However, I would think that it is rarely (if not never)
>> the case.
>>
> ...
> I think not.
> In our set of KBs this (or similar) is already the case, and vice versa.
> And it is about to get worse on a world-wide scale.
>
> Consider:
> I have URIa and I have done a lot of work to discover that I think that URIb
> and URIc (from another source) should be considered the same.
> (In fact, one of the things that gives me confidence is that I found this
> other KB I had some trust of that made the same assertion.)
> Clearly I could just assert URIa owl:sameAs URIb.
> But then my knowledge that URIa owl:sameAs URIc becomes fragile; it depends
> on my users finding the other KB, on that KB continuing to exist, and that
> the other KB does not change its mind.
> The only safe thing to do (unless I want to risk losing the knowledge, and
> the work I put in to glean it) is assert URIa owl:sameAs URIc myself.
> Now the situation you describe has happened.

I completely agree with you, but let's put that back into context. My
goal is just to have an uniform measure for outbound links. If in my
dataset I have

URIa owl:sameAs URIb, URIc.

Should I count one outbound link, or two? However, I think the
"jitter" introduced by this sort of issues is much lower than the
difference you can introduce by applying the transforms I mentioned in
the beginning of my thread (going from foaf:based_near outbound links
to owl:sameAs, for example)

For example, all the stats available on dbtune doesn't change by
counting one outbound link instead of two in that case. Whereas by
applying the transform mentioned above to Jamendo, for example, makes
it go from 3244 to 289!

>
> The inverse is a little more robust. The KB with URIb and URIc had worked
> out that URIa is the same. They can just assert URIb owl:sameAs URIa and
> trust to the continued knowledge of the sameness of URIb and URIc. But the
> really safe thing to do is also assert URIc owl:sameAs URIa, so the
> knowledge will be preserved if knowledge of URIb changes or indeed the URI
> is removed or somehow deprecated.
>
> Of course this argument can be extended as more URIs are discovered.
> This is the nature of using a binary relation in this way, and results in an
> O(n squared) graph. You can take architectural steps to reduce it,
> introducing canons and things like that, but the fundamental big O problem
> is still there.

Agreed. This is a really fundamental problem...

Best,
y

>
> Best
> Hugh
>
> --
> Hugh Glaser,  Reader
>              Dependable Systems & Software Engineering
>              School of Electronics and Computer Science,
>              University of Southampton,
>              Southampton SO17 1BJ
> Work: +44 (0)23 8059 3670, Fax: +44 (0)23 8059 3045
> Mobile: +44 (0)78 9422 3822, Home: +44 (0)23 8061 5652
> http://www.ecs.soton.ac.uk/~hg/
>
> "If we have a correct theory but merely prate about it, pigeonhole it, and
> do not put it into practice, then the theory, however good, is of no
> significance."
>
>
>

Received on Wednesday, 6 August 2008 10:35:24 UTC