Re: Visualizing LOD Linkage from Hugh Glaser on 2008-08-06 (public-lod@w3.org from August 2008)

From: Hugh Glaser <hg@ecs.soton.ac.uk>
Date: Wed, 6 Aug 2008 11:56:22 +0100
To: "public-lod@w3.org" <public-lod@w3.org>
Message-ID: <C4BF41E6.263F8%hg@ecs.soton.ac.uk>
D¹accord.

My bit:
We suspect that links have different "value", and would like to capture that
in some way. But that would be lots of numbers etc.
And we would like to capture inbound links, but that is a full research
project.

But somehow I do think that I might like to know separately sameAs v. use of
URIs that go to linked data elsewhere.
So, for sameAs (or similar, including seeAlso), a measure of how many of my
URIs are sameAs other peoples' (so the answer to your question below is 1).
And a separate number that indicates the number of simple outgoing links.

[Of course, I can do all sorts of stuff to fiddle things, but devising
benchmarks that are no open to abuse is always a challenge, and I suspect
that we can not solve that problem here - leaving it to the social world is
my preference at the moment.]

By the way, if we want numeric measures, can I suggest logarithmic please?
Numbers such as 1,4,5,6 will be much easier to see and compare on a diagram
than 10, 10000, 234712, 2437145.


On 06/08/2008 11:34, "Yves Raimond" <yves.raimond@gmail.com> wrote:

> On Wed, Aug 6, 2008 at 11:15 AM, Hugh Glaser <hg@ecs.soton.ac.uk> wrote:
>>
>> On 06/08/2008 09:54, "Yves Raimond" <yves.raimond@gmail.com> wrote:
>>
>>>
>>>
>>> Hello!
>>>
>> ...
>>>
>>> I am not sure if I interpret it correctly - do you mean that you could
>>> link to two URIs which are in fact sameAs in the target dataset?
>>> Indeed, in that case, the measure would be slightly higher than what
>>> it should be. However, I would think that it is rarely (if not never)
>>> the case.
>>>
>> ...
>> I think not.
>> In our set of KBs this (or similar) is already the case, and vice versa.
>> And it is about to get worse on a world-wide scale.
>>
>> Consider:
>> I have URIa and I have done a lot of work to discover that I think that URIb
>> and URIc (from another source) should be considered the same.
>> (In fact, one of the things that gives me confidence is that I found this
>> other KB I had some trust of that made the same assertion.)
>> Clearly I could just assert URIa owl:sameAs URIb.
>> But then my knowledge that URIa owl:sameAs URIc becomes fragile; it depends
>> on my users finding the other KB, on that KB continuing to exist, and that
>> the other KB does not change its mind.
>> The only safe thing to do (unless I want to risk losing the knowledge, and
>> the work I put in to glean it) is assert URIa owl:sameAs URIc myself.
>> Now the situation you describe has happened.
>
> I completely agree with you, but let's put that back into context. My
> goal is just to have an uniform measure for outbound links. If in my
> dataset I have
>
> URIa owl:sameAs URIb, URIc.
>
> Should I count one outbound link, or two? However, I think the
> "jitter" introduced by this sort of issues is much lower than the
> difference you can introduce by applying the transforms I mentioned in
> the beginning of my thread (going from foaf:based_near outbound links
> to owl:sameAs, for example)
>
> For example, all the stats available on dbtune doesn't change by
> counting one outbound link instead of two in that case. Whereas by
> applying the transform mentioned above to Jamendo, for example, makes
> it go from 3244 to 289!
>
>>
>> The inverse is a little more robust. The KB with URIb and URIc had worked
>> out that URIa is the same. They can just assert URIb owl:sameAs URIa and
>> trust to the continued knowledge of the sameness of URIb and URIc. But the
>> really safe thing to do is also assert URIc owl:sameAs URIa, so the
>> knowledge will be preserved if knowledge of URIb changes or indeed the URI
>> is removed or somehow deprecated.
>>
>> Of course this argument can be extended as more URIs are discovered.
>> This is the nature of using a binary relation in this way, and results in an
>> O(n squared) graph. You can take architectural steps to reduce it,
>> introducing canons and things like that, but the fundamental big O problem
>> is still there.
>
> Agreed. This is a really fundamental problem...
>
> Best,
> y
>
>>
>> Best
>> Hugh
>>
>> --
>> Hugh Glaser,  Reader
>>              Dependable Systems & Software Engineering
>>              School of Electronics and Computer Science,
>>              University of Southampton,
>>              Southampton SO17 1BJ
>> Work: +44 (0)23 8059 3670, Fax: +44 (0)23 8059 3045
>> Mobile: +44 (0)78 9422 3822, Home: +44 (0)23 8061 5652
>> http://www.ecs.soton.ac.uk/~hg/
>>
>> "If we have a correct theory but merely prate about it, pigeonhole it, and
>> do not put it into practice, then the theory, however good, is of no
>> significance."
>>
>>
>>
>
Received on Wednesday, 6 August 2008 10:57:11 UTC