- From: Aldo Bucchi <aldo.bucchi@gmail.com>
- Date: Sat, 2 Aug 2008 17:35:22 -0400
- To: "Giovanni Tummarello" <giovanni.tummarello@deri.org>, "public-lod@w3.org" <public-lod@w3.org>
Hi Giovanni, On Sat, Aug 2, 2008 at 2:27 PM, Giovanni Tummarello <giovanni.tummarello@deri.org> wrote: > > Hi guys, > > i think visualizing the linkage between dataset is an interesting > goal. To this end we're going to be starting some experiments within > sindice to try to come up with a complete and automatic map fo such > linkings. > > for once its not the just the "cool" factor we're after: such map > should help applicaiton implementers understand what kind of query > they can perform and what they can hope to obtain. Sounds great. Sindice is going to become a key piece in moving this forward. IMO, even better than a static visualization ( which would not scale well, and probably become too complex to interpret ) what would be needed is a tool. Are you planning on opening a some sort of VoiD introspection API for the indexed content in Sindice? This "tool" sounds like a place for community work and a lot of trial and error, so it would be nice to have a solid API to start from. Moving forward a bit, I imagine that sparql query builders like OpenLink's could, eventually, become integrated with such API to allow some introspection into the available datasets. Off the top of my head: auto complete when composing, activate/deactivate datasets when executing, etc. And just a guess... this API shouldn't belong to Sindice. It should be part of the semweb... introspection. Agents like sindice should announce themselves and be discoverable ( this is obviously something old ). One new DNS-ish layer. And it should be RDF too. > > The technologies involved in such analisys are pretty cool, an hadoop > job reads over our hbase repositories of rdf and microformats and > should compose the map.. think of how many things could be done more > in terms of analisys with such technologies and the 28 core cluster we > have now (and hopefully with the 800 core cluster we might have in 2-3 > months, more news later). > > .. fancy to write such hadoop jobs and deliver cool large scale apis > to the linked data communities? why not come over for an internship or > a visiting research period. This infrastructure is available so.. and > we're open to share it. More stable positions also available. > > Giovanni > > p.s. in the meanwhile we should be able to show soon a query that > lists all the same as connection from a dataset to another, hopefully > in a few days. > > > On Sat, Aug 2, 2008 at 5:23 PM, Kingsley Idehen <kidehen@openlinksw.com> wrote: >> >> Oktie Hassanzadeh wrote: >>> >>> Yves Raimond wrote: >>>> >>>> Hello! >>>> >>>> >>>>> >>>>> I would like to suggest that publishers of new linked data spaces that >>>>> plug >>>>> into the growing LOD include the following: >>>>> >>>>> 1. cross-link information >>>>> >>>> >>>> I would also suggest we find a better measure for interlinkage than a >>>> raw number of triples linking one dataset to another. >>>> For example, http://dbtune.org/musicbrainz/ creates its own identifier >>>> for languages (http://dbtune.org/musicbrainz/directory/language), >>>> which are owl:sameAs'ed to the corresponding languages in Lingvoj when >>>> applicable, whereas linkedmdb directly links to the Lingvoj >>>> identifiers. In the latter case, the raw number of interlinks will be >>>> higher, but could be reduced a lot by creating identifiers for >>>> language and use sameAs. >>>> >>>> The same applies for geographic locations, for example. Some datasets >>>> use foaf:based_near to link to Geonames, some others create their own >>>> identifiers, and then link to the corresponding Geonames locations >>>> through owl:sameAs. For the same dataset, this two methodologies will >>>> lead to completely different numbers. >>>> >>>> To boost the statistics of a dataset, we could simply link each person >>>> or group in them to http://dbpedia.org/class/yago/Entity100001740 >>>> through rdf:type :-D >>>> >>>> So I think we should agree on what we count as "interlinks" before >>>> publishing such statistics, so that we can actually use these values? >>>> >>>> My recommendation would be to always go for the lowest value - the one >>>> you'd obtain by creating your own identifiers and using owl:sameAs >>>> (which would be equivalent to the number of distinct external URIs >>>> mentioned in your dataset). >>>> >>>> What do you think? >>>> >>>> Cheers! >>>> y >>>> >>>> >>> >>> I totally agree! Some interlinks are not as valuable as others. That's why >>> we report the number of links based on their type and target and also we >>> store and publish data about the linkage methodology. I also believe we >>> should be honest about the value of the interlinks. >>> >>> Apart from the links to languages and geographic locations, another >>> example of such "easy" links is the links we have in LinkedMDB to the >>> Authors of books in RDF Book Mashup which is done only based on the name of >>> the authors, comparing with the links to the books related to the movies for >>> which we have to match the titles and find the ISBN of the books. I just >>> changed LinkedMDB's statistics [1] to show two different numbers for these >>> links. >>> >>> Regarding languages, I was not sure which is the right way, to link >>> directly yo lingvoj or to have our own entities for languages, but after >>> reading some discussions like [1], we decided to link directly to lingvoj. >>> >>> >>> Regards, >>> Oktie >>> >>> [1] http://www.linkedmdb.org:8080/Main/Statistics >>> [2] http://esw.w3.org/topic/Languages_as_RDF_Resources >>> >> Oktie, >> >> Re. sample entities, could you sprinkle out a few sample entity URIs from >> your data space? For instance, a third column with a drop down should do >> the trick. >> >> -- >> >> >> Regards, >> >> Kingsley Idehen Weblog: http://www.openlinksw.com/blog/~kidehen >> President & CEO OpenLink Software Web: http://www.openlinksw.com >> >> >> >> > > -- :::: Aldo Bucchi :::: +56 9 7623 8653 skype:aldo.bucchi http://aldobucchi.com/
Received on Saturday, 2 August 2008 21:35:59 UTC