Re: Contd: Visualizing LOD Linkage from Giovanni Tummarello on 2008-08-02 (public-lod@w3.org from August 2008)

From: Giovanni Tummarello <giovanni.tummarello@deri.org>
Date: Sat, 2 Aug 2008 19:27:44 +0100
To: public-lod@w3.org
Message-ID: <210271540808021127p3de7b6b3v627d33a373a63bef@mail.gmail.com>
Hi guys,

i think visualizing the linkage between dataset is an interesting
goal. To this end we're going to be starting some experiments within
sindice to try to come up with a complete and automatic map fo such
linkings.

for once its not the just the "cool" factor we're after: such map
should help applicaiton implementers understand what kind of query
they can perform and what they can hope to obtain.

The technologies involved in such analisys are pretty cool, an hadoop
job reads over our hbase repositories of rdf and microformats and
should compose the map.. think of how many things could be done more
in terms of analisys with such technologies and the 28 core cluster we
have now (and hopefully with the 800 core cluster we might have in 2-3
months, more news later).

.. fancy to write such hadoop jobs and deliver cool large scale apis
to the linked data communities? why not come over for an internship or
a visiting  research period. This infrastructure is available so.. and
we're open to share it. More stable positions also available.

Giovanni

p.s. in the meanwhile we should be able to show soon a query that
lists all the same as connection from a dataset to another, hopefully
in a few days.


On Sat, Aug 2, 2008 at 5:23 PM, Kingsley Idehen <kidehen@openlinksw.com> wrote:
>
> Oktie Hassanzadeh wrote:
>>
>> Yves Raimond wrote:
>>>
>>> Hello!
>>>
>>>
>>>>
>>>> I would like to suggest that publishers of new linked data spaces that
>>>> plug
>>>> into the growing LOD include the following:
>>>>
>>>> 1. cross-link information
>>>>
>>>
>>> I would also suggest we find a better measure for interlinkage than a
>>> raw number of triples linking one dataset to another.
>>> For example, http://dbtune.org/musicbrainz/ creates its own identifier
>>> for languages (http://dbtune.org/musicbrainz/directory/language),
>>> which are owl:sameAs'ed to the corresponding languages in Lingvoj when
>>> applicable, whereas linkedmdb directly links to the Lingvoj
>>> identifiers. In the latter case, the raw number of interlinks will be
>>> higher, but could be reduced a lot by creating identifiers for
>>> language and use sameAs.
>>>
>>> The same applies for geographic locations, for example. Some datasets
>>> use foaf:based_near to link to Geonames, some others create their own
>>> identifiers, and then link to the corresponding Geonames locations
>>> through owl:sameAs. For the same dataset, this two methodologies will
>>> lead to completely different numbers.
>>>
>>> To boost the statistics of a dataset, we could simply link each person
>>> or group in them to http://dbpedia.org/class/yago/Entity100001740
>>> through rdf:type :-D
>>>
>>> So I think we should agree on what we count as "interlinks" before
>>> publishing such statistics, so that we can actually use these values?
>>>
>>> My recommendation would be to always go for the lowest value - the one
>>> you'd obtain by creating your own identifiers and using owl:sameAs
>>> (which would be equivalent to the number of distinct external URIs
>>> mentioned in your dataset).
>>>
>>> What do you think?
>>>
>>> Cheers!
>>> y
>>>
>>>
>>
>> I totally agree! Some interlinks are not as valuable as others. That's why
>> we report the number of links based on their type and target and also we
>> store and publish data about the linkage methodology. I also believe we
>> should be honest about the value of the interlinks.
>>
>> Apart from the links to languages and geographic locations, another
>> example of such "easy" links is the links we have in LinkedMDB to the
>> Authors of books in RDF Book Mashup which is done only based on the name of
>> the authors, comparing with the links to the books related to the movies for
>> which we have to match the titles and find the ISBN of the books. I just
>> changed LinkedMDB's statistics [1] to show two different numbers for these
>> links.
>>
>> Regarding languages, I was not sure which is the right way, to link
>> directly yo lingvoj or to have our own entities for languages, but after
>> reading some discussions like [1], we decided to link directly to lingvoj.
>>
>>
>> Regards,
>> Oktie
>>
>> [1] http://www.linkedmdb.org:8080/Main/Statistics
>> [2] http://esw.w3.org/topic/Languages_as_RDF_Resources
>>
> Oktie,
>
> Re. sample entities, could you sprinkle out a few sample entity URIs from
> your data space?  For instance, a third column with a drop down should do
> the trick.
>
> --
>
>
> Regards,
>
> Kingsley Idehen       Weblog: http://www.openlinksw.com/blog/~kidehen
> President & CEO OpenLink Software     Web: http://www.openlinksw.com
>
>
>
>
Received on Saturday, 2 August 2008 18:28:21 UTC