- From: Kingsley Idehen <kidehen@openlinksw.com>
- Date: Sat, 02 Aug 2008 12:23:26 -0400
- To: Oktie Hassanzadeh <oktie@cs.toronto.edu>
- CC: Yves Raimond <yves.raimond@gmail.com>, public-lod@w3.org
Oktie Hassanzadeh wrote: > Yves Raimond wrote: >> Hello! >> >> >>> I would like to suggest that publishers of new linked data spaces that plug >>> into the growing LOD include the following: >>> >>> 1. cross-link information >>> >> >> I would also suggest we find a better measure for interlinkage than a >> raw number of triples linking one dataset to another. >> For example, http://dbtune.org/musicbrainz/ creates its own identifier >> for languages (http://dbtune.org/musicbrainz/directory/language), >> which are owl:sameAs'ed to the corresponding languages in Lingvoj when >> applicable, whereas linkedmdb directly links to the Lingvoj >> identifiers. In the latter case, the raw number of interlinks will be >> higher, but could be reduced a lot by creating identifiers for >> language and use sameAs. >> >> The same applies for geographic locations, for example. Some datasets >> use foaf:based_near to link to Geonames, some others create their own >> identifiers, and then link to the corresponding Geonames locations >> through owl:sameAs. For the same dataset, this two methodologies will >> lead to completely different numbers. >> >> To boost the statistics of a dataset, we could simply link each person >> or group in them to http://dbpedia.org/class/yago/Entity100001740 >> through rdf:type :-D >> >> So I think we should agree on what we count as "interlinks" before >> publishing such statistics, so that we can actually use these values? >> >> My recommendation would be to always go for the lowest value - the one >> you'd obtain by creating your own identifiers and using owl:sameAs >> (which would be equivalent to the number of distinct external URIs >> mentioned in your dataset). >> >> What do you think? >> >> Cheers! >> y >> >> > > I totally agree! Some interlinks are not as valuable as others. That's > why we report the number of links based on their type and target and > also we store and publish data about the linkage methodology. I also > believe we should be honest about the value of the interlinks. > > Apart from the links to languages and geographic locations, another > example of such "easy" links is the links we have in LinkedMDB to the > Authors of books in RDF Book Mashup which is done only based on the > name of the authors, comparing with the links to the books related to > the movies for which we have to match the titles and find the ISBN of > the books. I just changed LinkedMDB's statistics [1] to show two > different numbers for these links. > > Regarding languages, I was not sure which is the right way, to link > directly yo lingvoj or to have our own entities for languages, but > after reading some discussions like [1], we decided to link directly > to lingvoj. > > > Regards, > Oktie > > [1] http://www.linkedmdb.org:8080/Main/Statistics > [2] http://esw.w3.org/topic/Languages_as_RDF_Resources > Oktie, Re. sample entities, could you sprinkle out a few sample entity URIs from your data space? For instance, a third column with a drop down should do the trick. -- Regards, Kingsley Idehen Weblog: http://www.openlinksw.com/blog/~kidehen President & CEO OpenLink Software Web: http://www.openlinksw.com
Received on Saturday, 2 August 2008 16:24:07 UTC