W3C home > Mailing lists > Public > public-lod@w3.org > August 2008

Re: Visualizing LOD Linkage

From: Peter Ansell <ansell.peter@gmail.com>
Date: Wed, 6 Aug 2008 08:02:42 +1000 (EST)
To: Yves Raimond <yves.raimond@gmail.com>
Cc: Kingsley Idehen <kidehen@openlinksw.com>, public-lod@w3.org, Richard Cyganiak <richard@cyganiak.de>
Message-ID: <1949880.1831217973759609.JavaMail.peter@Macintosh-2.local>


----- "Yves Raimond" <yves.raimond@gmail.com> wrote:

> Hello!
> 
> >
> > Is this really a problem? Why not just keep in mind that triple
> numbers are
> > a purely mechanical measure and are no indication of quality or
> usefulness?
> >
> > A raw triple count is just that, a raw triple count. It doesn't
> mean
> > anything else. And it is useful for anyone who wants to
> > store/index/postprocess a dataset/linkset, because for storage and
> querying
> > the number of triples matters.
> >
> 
> I wasn't talking about triples counts actually (as you said, a triple
> count is just that). But about quantifying the number of interlinks
> in
> a way that is consistent across dataset (just that, no notion of
> usefulness:-) ) - eg. in a way where you can say "oh, this dataset
> indeed have more interlinks than this one". I was arguing that the
> current way of doing it (counting triples that mention an `external'
> resource) is not consistent, as you can easily make that number
> higher
> or lower by applying simple transformations to your data.
> 
> I think a consistent way of measuring the interlinking is to just
> count the number of distinct `external' resources in the dataset
> (which will give the lowest number you get by applying such
> transformations).
> 
> See for example http://dbtune.org/musicbrainz/,
> http://dbtune.org/bbc/peel/ or http://dbtune.org/jamendo/
> 

It depends on whether you know that the external references are distinct just based on the URI string. If someone links out to multiple formats using external resource links then they would have to be counted as multiple links as you have no way of knowing that they are different, except in the case where you reserve these types of links as RDF literals.

I think a more effective way is to measure both the number of outbound and inbound links, something which is only possible if you have a way of determining in the rest of the web who links back to a particular resource in your own scheme. If you count inbound links, like Google do with PageRank, then you get a better idea of who is actually being used as opposed to who is just linking into all the others it can find. See [1] for a description of it in the Bio2RDF project.

Cheers,

Peter

[1] http://dx.doi.org/10.1007/978-3-540-69828-9_15
Received on Tuesday, 5 August 2008 22:03:25 UTC

This archive was generated by hypermail 2.3.1 : Sunday, 31 March 2013 14:24:17 UTC