W3C home > Mailing lists > Public > public-lod@w3.org > August 2008

Contd: Visualizing LOD Linkage

From: Kingsley Idehen <kidehen@openlinksw.com>
Date: Sat, 02 Aug 2008 12:23:26 -0400
Message-ID: <489489FE.9070201@openlinksw.com>
To: Oktie Hassanzadeh <oktie@cs.toronto.edu>
CC: Yves Raimond <yves.raimond@gmail.com>, public-lod@w3.org

Oktie Hassanzadeh wrote:
> Yves Raimond wrote:
>> Hello!
>>
>>   
>>> I would like to suggest that publishers of new linked data spaces that plug
>>> into the growing LOD include the following:
>>>
>>> 1. cross-link information
>>>     
>>
>> I would also suggest we find a better measure for interlinkage than a
>> raw number of triples linking one dataset to another.
>> For example, http://dbtune.org/musicbrainz/ creates its own identifier
>> for languages (http://dbtune.org/musicbrainz/directory/language),
>> which are owl:sameAs'ed to the corresponding languages in Lingvoj when
>> applicable, whereas linkedmdb directly links to the Lingvoj
>> identifiers. In the latter case, the raw number of interlinks will be
>> higher, but could be reduced a lot by creating identifiers for
>> language and use sameAs.
>>
>> The same applies for geographic locations, for example. Some datasets
>> use foaf:based_near to link to Geonames, some others create their own
>> identifiers, and then link to the corresponding Geonames locations
>> through owl:sameAs. For the same dataset, this two methodologies will
>> lead to completely different numbers.
>>
>> To boost the statistics of a dataset, we could simply link each person
>> or group in them to http://dbpedia.org/class/yago/Entity100001740
>> through rdf:type :-D
>>
>> So I think we should agree on what we count as "interlinks" before
>> publishing such statistics, so that we can actually use these values?
>>
>> My recommendation would be to always go for the lowest value - the one
>> you'd obtain by creating your own identifiers and using owl:sameAs
>> (which would be equivalent to the number of distinct external URIs
>> mentioned in your dataset).
>>
>> What do you think?
>>
>> Cheers!
>> y
>>
>>   
>
> I totally agree! Some interlinks are not as valuable as others. That's 
> why we report the number of links based on their type and target and 
> also we store and publish data about the linkage methodology. I also 
> believe we should be honest about the value of the interlinks.
>
> Apart from the links to languages and geographic locations, another 
> example of such "easy" links is the links we have in LinkedMDB to the 
> Authors of books in RDF Book Mashup which is done only based on the 
> name of the authors, comparing with the links to the books related to 
> the movies for which we have to match the titles and find the ISBN of 
> the books. I just changed LinkedMDB's statistics [1] to show two 
> different numbers for these links.
>
> Regarding languages, I was not sure which is the right way, to link 
> directly yo lingvoj or to have our own entities for languages, but 
> after reading some discussions like [1], we decided to link directly 
> to lingvoj.
>
>
> Regards,
> Oktie
>
> [1] http://www.linkedmdb.org:8080/Main/Statistics
> [2] http://esw.w3.org/topic/Languages_as_RDF_Resources
>
Oktie,

Re. sample entities, could you sprinkle out a few sample entity URIs 
from your data space?  For instance, a third column with a drop down 
should do the trick.

-- 


Regards,

Kingsley Idehen	      Weblog: http://www.openlinksw.com/blog/~kidehen
President & CEO 
OpenLink Software     Web: http://www.openlinksw.com
Received on Saturday, 2 August 2008 16:24:07 UTC

This archive was generated by hypermail 2.3.1 : Sunday, 31 March 2013 14:24:17 UTC