W3C home > Mailing lists > Public > public-lod@w3.org > August 2008

Re: Visualizing LOD Linkage

From: Oktie Hassanzadeh <oktie@cs.toronto.edu>
Date: Sat, 02 Aug 2008 11:58:55 -0400
Message-ID: <4894843F.1090405@cs.toronto.edu>
To: Yves Raimond <yves.raimond@gmail.com>, public-lod@w3.org
Yves Raimond wrote:
> Hello!
>> I would like to suggest that publishers of new linked data spaces that plug
>> into the growing LOD include the following:
>> 1. cross-link information
> I would also suggest we find a better measure for interlinkage than a
> raw number of triples linking one dataset to another.
> For example, http://dbtune.org/musicbrainz/ creates its own identifier
> for languages (http://dbtune.org/musicbrainz/directory/language),
> which are owl:sameAs'ed to the corresponding languages in Lingvoj when
> applicable, whereas linkedmdb directly links to the Lingvoj
> identifiers. In the latter case, the raw number of interlinks will be
> higher, but could be reduced a lot by creating identifiers for
> language and use sameAs.
> The same applies for geographic locations, for example. Some datasets
> use foaf:based_near to link to Geonames, some others create their own
> identifiers, and then link to the corresponding Geonames locations
> through owl:sameAs. For the same dataset, this two methodologies will
> lead to completely different numbers.
> To boost the statistics of a dataset, we could simply link each person
> or group in them to http://dbpedia.org/class/yago/Entity100001740
> through rdf:type :-D
> So I think we should agree on what we count as "interlinks" before
> publishing such statistics, so that we can actually use these values?
> My recommendation would be to always go for the lowest value - the one
> you'd obtain by creating your own identifiers and using owl:sameAs
> (which would be equivalent to the number of distinct external URIs
> mentioned in your dataset).
> What do you think?
> Cheers!
> y

I totally agree! Some interlinks are not as valuable as others. That's 
why we report the number of links based on their type and target and 
also we store and publish data about the linkage methodology. I also 
believe we should be honest about the value of the interlinks.

Apart from the links to languages and geographic locations, another 
example of such "easy" links is the links we have in LinkedMDB to the 
Authors of books in RDF Book Mashup which is done only based on the name 
of the authors, comparing with the links to the books related to the 
movies for which we have to match the titles and find the ISBN of the 
books. I just changed LinkedMDB's statistics [1] to show two different 
numbers for these links.

Regarding languages, I was not sure which is the right way, to link 
directly yo lingvoj or to have our own entities for languages, but after 
reading some discussions like [1], we decided to link directly to lingvoj.


[1] http://www.linkedmdb.org:8080/Main/Statistics
[2] http://esw.w3.org/topic/Languages_as_RDF_Resources
Received on Saturday, 2 August 2008 15:59:46 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 7 January 2015 15:15:52 UTC