Re: Visualizing LOD Linkage

On Sat, Aug 2, 2008 at 5:17 PM, Kingsley Idehen <kidehen@openlinksw.com> wrote:
> Yves Raimond wrote:
>>
>> Hello!
>>
>>
>>>
>>> I would like to suggest that publishers of new linked data spaces that
>>> plug
>>> into the growing LOD include the following:
>>>
>>> 1. cross-link information
>>>
>>
>> I would also suggest we find a better measure for interlinkage than a
>> raw number of triples linking one dataset to another.
>> For example, http://dbtune.org/musicbrainz/ creates its own identifier
>> for languages (http://dbtune.org/musicbrainz/directory/language),
>> which are owl:sameAs'ed to the corresponding languages in Lingvoj when
>> applicable, whereas linkedmdb directly links to the Lingvoj
>> identifiers. In the latter case, the raw number of interlinks will be
>> higher, but could be reduced a lot by creating identifiers for
>> language and use sameAs.
>>
>> The same applies for geographic locations, for example. Some datasets
>> use foaf:based_near to link to Geonames, some others create their own
>> identifiers, and then link to the corresponding Geonames locations
>> through owl:sameAs. For the same dataset, this two methodologies will
>> lead to completely different numbers.
>>
>> To boost the statistics of a dataset, we could simply link each person
>> or group in them to http://dbpedia.org/class/yago/Entity100001740
>> through rdf:type :-D
>>
>
> Amen!
>
> And it also means we start to expose the fact that LOD is not an "instance
> level only" linked data space (a sad misconception).
>
>> So I think we should agree on what we count as "interlinks" before
>> publishing such statistics, so that we can actually use these values?
>>
>
> We should basically express linkages across instance and schema/data
> dictionary vectors. This also helps those looking to build LOD applications.
>
> Of course there is more to come re. the injection of "data dictionary /
> schema" linkage aspects of LOD, but no harm in getting our thoughts in order
> re. "best practices" for the growing cloud :-)
>>
>> My recommendation would be to always go for the lowest value - the one
>> you'd obtain by creating your own identifiers and using owl:sameAs
>> (which would be equivalent to the number of distinct external URIs
>> mentioned in your dataset).
>>
>> What do you think?
>>
>
> Good Idea, so share you page as a nice example :-)
>

I just gave it a shot on Jamendo, counting the results of a SELECT
DISTINCT query, and this is indeed a bit depressing.
http://dbtune.org/jamendo/
For example, the Geonames interlinking drops from 3244 to 289 :-)

Some similar statistics from Musicbrainz at
http://dbtune.org/musicbrainz/ , which I'll publish when I get some
time to figure out how to tweak d2r templates :-)

Distinct DBpedia albums - 22426
Distinct DBpedia artists - 39877
Distinct MySpace artists (on http://dbtune.org/myspace/) - 14668
Distinct DBpedia countries - 245
Distinct Lingvoj languages - 185

Cheers!
y


>
> Kingsley
>>
>> Cheers!
>> y
>>
>>
>>
>>>
>>> 2. cross-link visual derived from the LOD cloud diagram.
>>>
>>> The Linked Movies Database has nice examples of both [1].
>>>
>>> Links:
>>>
>>> 1. http://www.linkedmdb.org:8080/Main/Interlinking
>>>
>>> --
>>>
>>>
>>> Regards,
>>>
>>> Kingsley Idehen       Weblog: http://www.openlinksw.com/blog/~kidehen
>>> President & CEO OpenLink Software     Web: http://www.openlinksw.com
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>
>>
>
>
> --
>
>
> Regards,
>
> Kingsley Idehen       Weblog: http://www.openlinksw.com/blog/~kidehen
> President & CEO OpenLink Software     Web: http://www.openlinksw.com
>
>
>
>
>

Received on Saturday, 2 August 2008 17:19:11 UTC