W3C home > Mailing lists > Public > public-lod@w3.org > August 2008

Re: Visualizing LOD Linkage

From: Yves Raimond <yves.raimond@gmail.com>
Date: Wed, 6 Aug 2008 10:26:52 +0100
Message-ID: <82593ac00808060226y69fd9c7bpaa63042403a64027@mail.gmail.com>
To: "Peter Ansell" <ansell.peter@gmail.com>
Cc: "Kingsley Idehen" <kidehen@openlinksw.com>, public-lod@w3.org, "Richard Cyganiak" <richard@cyganiak.de>


>> Hello!
>> >
>> > It depends on whether you know that the external references are
>> distinct just based on the URI string. If someone links out to
>> multiple formats using external resource links then they would have to
>> be counted as multiple links as you have no way of knowing that they
>> are different, except in the case where you reserve these types of
>> links as RDF literals.
>> >
>> I am not sure if I interpret it correctly - do you mean that you
>> could
>> link to two URIs which are in fact sameAs in the target dataset?
>> Indeed, in that case, the measure would be slightly higher than what
>> it should be. However, I would think that it is rarely (if not never)
>> the case.
> I personally don't put sameAs on URI's which relate to the same thing but are really just different representations, ie, the HTML version doesn't get sameAs the RDF version.

Oh god, I wasn't suggesting that *at all*!!! Replace "outbound links"
to "outbound links to *things* in other LOD datasets" in what I said
before, if that makes it clearer.

>> > I think a more effective way is to measure both the number of
>> outbound and inbound links, something which is only possible if you
>> have a way of determining in the rest of the web who links back to a
>> particular resource in your own scheme. If you count inbound links,
>> like Google do with PageRank, then you get a better idea of who is
>> actually being used as opposed to who is just linking into all the
>> others it can find. See [1] for a description of it in the Bio2RDF
>> project.
>> >
>> I think this measure of inbound links is more related to a
>> "usefulness" metric, as highlighted by Richard. Again, I do not
>> attempt to propose such a measure.
>> I just want the raw number of "outbound links to other datasets" to
>> be
>> *comparable* amongst datasets.
> I am not completely sure what the value in just measuring outbound links is myself. As you say it is easy to manipulate, and depends heavily on the particular domain that the database is being used from. To get the equaliser I think you need to include inbound links somehow, or possibly scale the number of outbound links against the size of the database to get an equaliser that way.

Yes, I would personally go for the latter. But still, you need
something to normalise that is consistently measured across datasets.

> Either way, you have to interactively discover these things, just talking about them could lead to us missing essential aspects that come up frequently and can be discounted if that is the aim of the ranking system. Has anyone done a primitive ranking based solely on external outbound link frequency across a few linked databases (other than the Bio2RDF data that is detailed on the link below which comes out with reasonable results for a combined ranking based on a combined value for inbound,outbound,and database overall size)?

I wouldn't use this raw measure for a ranking system. I am just saying
that the statistics for outbound links published by the different LOD
datasets so far are not consistent, that's all.


>>> [1] http://dx.doi.org/10.1007/978-3-540-69828-9_15
>>(link appears to be broken)
> Springer Link DOI system must be broken. Try the following
> http://www.springerlink.com/content/w611j82w7v4672r3/fulltext.pdf
> The following image link is also quite interesting:
> http://bio2rdf.wiki.sourceforge.net/space/showimage/bio2rdfmap_blanc.png
> Cheers,
> Peter
Received on Wednesday, 6 August 2008 09:27:29 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 7 January 2015 15:15:52 UTC