- From: Michael Hausenblas <michael.hausenblas@deri.org>
- Date: Thu, 12 Feb 2009 08:43:18 +0000
- To: Bernhard Haslhofer <bernhard.haslhofer@univie.ac.at>
- CC: Linked Data community <public-lod@w3.org>
Bernhard, All, I agree that the process should be automated or at least automatable, however, I guess we need humans in the loop (as often ;). Now, thinking more about this, I'm unsure actually if this issue should be addressed on the 'descriptive' level (that is, via voiD ;). To broaden the discussion, I've put my thoughts together at [1]; the two main 'design' criteria for the solution, IMHO, are: 1. someone (machine or human) who *uses* the data *reports* it, and 2. the dataset publisher (rather than a centralised service) *fixes* it. Cheers, Michael [1] http://webofdata.wordpress.com/2009/02/12/how-to-deal-with-broken-data-links / -- Dr. Michael Hausenblas DERI - Digital Enterprise Research Institute National University of Ireland, Lower Dangan, Galway, Ireland, Europe Tel. +353 91 495730 http://sw-app.org/about.html > From: Bernhard Haslhofer <bernhard.haslhofer@univie.ac.at> > Date: Thu, 12 Feb 2009 09:25:07 +0100 > To: Michael Hausenblas <michael.hausenblas@deri.org> > Cc: Linked Data community <public-lod@w3.org> > Subject: Re: Broken Links in LOD Data Sets > > Morning, > > first of all, thanks for your input on that issue. I've started this > thread because it is always one of the first of questions I get from > potential content providers. Especially from institutions that must > guarantee a certain kind of quality in their data, such as libraries. > If, for instance they link to a LOD-published concept in a thesaurus > or a DBPedia resource, and these resources change / disappear over > time it is difficult to provide that kind of quality. > > I partly agree with Kingsley's answer "You have to test for Null > Pointers (URIs) when programming for the Linked Data Web too" - this > is of course true. But imagine programming against a DB which does not > provide referential integrity - a nightmare. Besides that I am not > sure if that is the answer those institutions might expect. Of course, > in an open world we cannot provide the "quality" features DBMS > provide, but we can at least provide some mechanism that helps solving > broken link issues. > > Michael, in my opinion this should be an automated process which can > > 1.) discover broken links - this could be the LOD source itself, > SINDICE, or any other client. If an LOD source "knows" that all the > links/references are OK, it could publish that info using voiD - > @Michael: do you think that makes sense? Maybe introduce > "void:numberOfTanglingLinks" in the dataset statistics? > > 2.) notify other clients / datasources about broken links - here I > thought about a kind of iNotify [1] service for LOD sources. > > 3.) fix the problem, if possible, by directing to alternative link > targets if there are any > > Since we are already working on a service which should provide (2) and > (3), I would be happy to contribute to a kind of "REVIVAL" thing, or > whatever you call it ;-) > > Best, > Bernhard > > > > [1] http://en.wikipedia.org/wiki/Inotify > > > On Feb 12, 2009, at 8:08 AM, Michael Hausenblas wrote: > >> >> Bernhard, All, >> >> So, another take on how to deal with broken links: couple of days >> ago I >> reported two broken links in a TAG finding [1] which was (quickly and >> pragmatically, bravo, TG!) addressed [2], recently. >> >> Let's abstract this away and apply to data rather than documents. The >> mechanism could work as follows: >> >> 1. A *human* (e.g. Through a built-in feature in a Web of Data >> browser such >> as Tabulator) encounters a broken link an reports it to the respective >> dataset publisher (the authoritative one who 'owns' it) >> >> OR >> >> 1. A machine encounters a broken link (should it then directly ping >> the >> dataset publisher or first 'ask' its master for permission?) >> >> 2. The dataset publisher acknowledges the broken link and creates >> according >> triples as done in the case for documents (cf. [2]) >> >> In case anyone wants to pick that up, I'm happy to contribute. The >> name? >> Well, a straw-man proposal could be called *re*pairing *vi*ntage link >> *val*ues (REVIVAL) - anyone? :) >> >> Cheers, >> Michael >> >> [1] http://lists.w3.org/Archives/Public/www-tag/2009Jan/0118.html >> [2] http://lists.w3.org/Archives/Public/www-tag/2009Feb/0068.html >> >> -- >> Dr. Michael Hausenblas >> DERI - Digital Enterprise Research Institute >> National University of Ireland, Lower Dangan, >> Galway, Ireland, Europe >> Tel. +353 91 495730 >> http://sw-app.org/about.html >> >> >>> From: Bernhard Haslhofer <bernhard.haslhofer@univie.ac.at> >>> Date: Thu, 5 Feb 2009 16:35:35 +0100 >>> To: Linked Data community <public-lod@w3.org> >>> Subject: Broken Links in LOD Data Sets >>> Resent-From: Linked Data community <public-lod@w3.org> >>> Resent-Date: Thu, 05 Feb 2009 15:36:13 +0000 >>> >>> >>> Hi all, >>> >>> we are currently working on the question how to deal with broken >>> links/ >>> references between resources in (distinct) LOD data sets and would >>> like to know your opinion on that issue. If there is some work going >>> on into this direction, please let me know. >>> >>> I think I do not really need to explain the problem. Everybody knows >>> it from the "human" Web when you follow a link and you get an >>> annoying >>> 404 response. >>> >>> If we assume that the consumers of LOD data are not humans but >>> applications, broken links/references are not only "annoying" but >>> could lead to severe processing errors if an application relies on a >>> kind of "referential integrity". >>> >>> Assume we have an LOD data source X exposing resources that describe >>> images and these images are linked with resources in DBPedia (e.g., >>> http://dbpedia.org/resource/Berlin) >>> . An application built on-top of X follows links to retrieve the geo- >>> coordinates in order to display the images on a virtual map. If now, >>> for some reason, the URL of the linked DB-Pedia resource changes >>> either because DBPedia is moved or re-organized, which I guess could >>> happen to any LOD source in a long-term perspective, the application >>> might crash if doesn't consider that referenced resources might move >>> or become unavailable. >>> >>> I know that "cool URIs don't change" but I am not sure if this >>> assumption holds in practice, especially in a long-term perspective. >>> >>> For the "human" Web several solutions have been proposed, e.g., >>> 1.) PURL and DOI services for translating URNs into resolvable URLs >>> 2.) forward references >>> 3.) robust link implementations, i.e., with each link you keep a set >>> of related search terms to retrieve moved / changed resources >>> 4.) observer / notification mechanisms >>> X.) ? >>> >>> I guess (1) is not really applicable for LOD resources because of >>> scalability and single-point of failure issues. (2) would require >>> that >>> LOD providers take care of setting up HTTP redirects for their moved >>> resources - no idea if anybody will do that in reality and how this >>> can scale. (3) could help to re-locate moved resources via search >>> engines like Sindice but not really fully automatically. (4) could at >>> least inform a data source that certain references are broken and it >>> could remove them. >>> >>> Another alternative is of course to completely leave the problem to >>> the application developers, which means that they must consider >>> that a >>> referenced resource might exist or not. I am not sure about the >>> practical consequences of that approach, especially if several data >>> sources are involved, but I have the feeling that it is getting >>> really >>> complicated if one cannot rely on any kind of referential integrity. >>> >>> Are there any existing mechanism that can give us at least some basic >>> feedback about the "quality" of an LOD data source? I think, the >>> referential integrity could be such a quality property... >>> >>> Thanks for your input on that issue, >>> >>> Bernhard >>> >>> ______________________________________________________ >>> Research Group Multimedia Information Systems >>> Department of Distributed and Multimedia Systems >>> Faculty of Computer Science >>> University of Vienna >>> >>> Postal Address: Liebiggasse 4/3-4, 1010 Vienna, Austria >>> Phone: +43 1 42 77 39635 Fax: +43 1 4277 39649 >>> E-Mail: bernhard.haslhofer@univie.ac.at >>> WWW: http://www.cs.univie.ac.at/bernhard.haslhofer >>> >>> >> >> > > ______________________________________________________ > Research Group Multimedia Information Systems > Department of Distributed and Multimedia Systems > Faculty of Computer Science > University of Vienna > > Postal Address: Liebiggasse 4/3-4, 1010 Vienna, Austria > Phone: +43 1 42 77 39635 Fax: +43 1 4277 39649 > E-Mail: bernhard.haslhofer@univie.ac.at > WWW: http://www.cs.univie.ac.at/bernhard.haslhofer >
Received on Thursday, 12 February 2009 08:44:00 UTC