- From: Michael Hausenblas <michael.hausenblas@deri.org>
- Date: Thu, 12 Feb 2009 08:43:18 +0000
- To: Bernhard Haslhofer <bernhard.haslhofer@univie.ac.at>
- CC: Linked Data community <public-lod@w3.org>
Bernhard, All,
I agree that the process should be automated or at least automatable,
however, I guess we need humans in the loop (as often ;).
Now, thinking more about this, I'm unsure actually if this issue should be
addressed on the 'descriptive' level (that is, via voiD ;). To broaden the
discussion, I've put my thoughts together at [1]; the two main 'design'
criteria for the solution, IMHO, are:
1. someone (machine or human) who *uses* the data *reports* it, and
2. the dataset publisher (rather than a centralised service) *fixes* it.
Cheers,
Michael
[1]
http://webofdata.wordpress.com/2009/02/12/how-to-deal-with-broken-data-links
/
--
Dr. Michael Hausenblas
DERI - Digital Enterprise Research Institute
National University of Ireland, Lower Dangan,
Galway, Ireland, Europe
Tel. +353 91 495730
http://sw-app.org/about.html
> From: Bernhard Haslhofer <bernhard.haslhofer@univie.ac.at>
> Date: Thu, 12 Feb 2009 09:25:07 +0100
> To: Michael Hausenblas <michael.hausenblas@deri.org>
> Cc: Linked Data community <public-lod@w3.org>
> Subject: Re: Broken Links in LOD Data Sets
>
> Morning,
>
> first of all, thanks for your input on that issue. I've started this
> thread because it is always one of the first of questions I get from
> potential content providers. Especially from institutions that must
> guarantee a certain kind of quality in their data, such as libraries.
> If, for instance they link to a LOD-published concept in a thesaurus
> or a DBPedia resource, and these resources change / disappear over
> time it is difficult to provide that kind of quality.
>
> I partly agree with Kingsley's answer "You have to test for Null
> Pointers (URIs) when programming for the Linked Data Web too" - this
> is of course true. But imagine programming against a DB which does not
> provide referential integrity - a nightmare. Besides that I am not
> sure if that is the answer those institutions might expect. Of course,
> in an open world we cannot provide the "quality" features DBMS
> provide, but we can at least provide some mechanism that helps solving
> broken link issues.
>
> Michael, in my opinion this should be an automated process which can
>
> 1.) discover broken links - this could be the LOD source itself,
> SINDICE, or any other client. If an LOD source "knows" that all the
> links/references are OK, it could publish that info using voiD -
> @Michael: do you think that makes sense? Maybe introduce
> "void:numberOfTanglingLinks" in the dataset statistics?
>
> 2.) notify other clients / datasources about broken links - here I
> thought about a kind of iNotify [1] service for LOD sources.
>
> 3.) fix the problem, if possible, by directing to alternative link
> targets if there are any
>
> Since we are already working on a service which should provide (2) and
> (3), I would be happy to contribute to a kind of "REVIVAL" thing, or
> whatever you call it ;-)
>
> Best,
> Bernhard
>
>
>
> [1] http://en.wikipedia.org/wiki/Inotify
>
>
> On Feb 12, 2009, at 8:08 AM, Michael Hausenblas wrote:
>
>>
>> Bernhard, All,
>>
>> So, another take on how to deal with broken links: couple of days
>> ago I
>> reported two broken links in a TAG finding [1] which was (quickly and
>> pragmatically, bravo, TG!) addressed [2], recently.
>>
>> Let's abstract this away and apply to data rather than documents. The
>> mechanism could work as follows:
>>
>> 1. A *human* (e.g. Through a built-in feature in a Web of Data
>> browser such
>> as Tabulator) encounters a broken link an reports it to the respective
>> dataset publisher (the authoritative one who 'owns' it)
>>
>> OR
>>
>> 1. A machine encounters a broken link (should it then directly ping
>> the
>> dataset publisher or first 'ask' its master for permission?)
>>
>> 2. The dataset publisher acknowledges the broken link and creates
>> according
>> triples as done in the case for documents (cf. [2])
>>
>> In case anyone wants to pick that up, I'm happy to contribute. The
>> name?
>> Well, a straw-man proposal could be called *re*pairing *vi*ntage link
>> *val*ues (REVIVAL) - anyone? :)
>>
>> Cheers,
>> Michael
>>
>> [1] http://lists.w3.org/Archives/Public/www-tag/2009Jan/0118.html
>> [2] http://lists.w3.org/Archives/Public/www-tag/2009Feb/0068.html
>>
>> --
>> Dr. Michael Hausenblas
>> DERI - Digital Enterprise Research Institute
>> National University of Ireland, Lower Dangan,
>> Galway, Ireland, Europe
>> Tel. +353 91 495730
>> http://sw-app.org/about.html
>>
>>
>>> From: Bernhard Haslhofer <bernhard.haslhofer@univie.ac.at>
>>> Date: Thu, 5 Feb 2009 16:35:35 +0100
>>> To: Linked Data community <public-lod@w3.org>
>>> Subject: Broken Links in LOD Data Sets
>>> Resent-From: Linked Data community <public-lod@w3.org>
>>> Resent-Date: Thu, 05 Feb 2009 15:36:13 +0000
>>>
>>>
>>> Hi all,
>>>
>>> we are currently working on the question how to deal with broken
>>> links/
>>> references between resources in (distinct) LOD data sets and would
>>> like to know your opinion on that issue. If there is some work going
>>> on into this direction, please let me know.
>>>
>>> I think I do not really need to explain the problem. Everybody knows
>>> it from the "human" Web when you follow a link and you get an
>>> annoying
>>> 404 response.
>>>
>>> If we assume that the consumers of LOD data are not humans but
>>> applications, broken links/references are not only "annoying" but
>>> could lead to severe processing errors if an application relies on a
>>> kind of "referential integrity".
>>>
>>> Assume we have an LOD data source X exposing resources that describe
>>> images and these images are linked with resources in DBPedia (e.g.,
>>> http://dbpedia.org/resource/Berlin)
>>> . An application built on-top of X follows links to retrieve the geo-
>>> coordinates in order to display the images on a virtual map. If now,
>>> for some reason, the URL of the linked DB-Pedia resource changes
>>> either because DBPedia is moved or re-organized, which I guess could
>>> happen to any LOD source in a long-term perspective, the application
>>> might crash if doesn't consider that referenced resources might move
>>> or become unavailable.
>>>
>>> I know that "cool URIs don't change" but I am not sure if this
>>> assumption holds in practice, especially in a long-term perspective.
>>>
>>> For the "human" Web several solutions have been proposed, e.g.,
>>> 1.) PURL and DOI services for translating URNs into resolvable URLs
>>> 2.) forward references
>>> 3.) robust link implementations, i.e., with each link you keep a set
>>> of related search terms to retrieve moved / changed resources
>>> 4.) observer / notification mechanisms
>>> X.) ?
>>>
>>> I guess (1) is not really applicable for LOD resources because of
>>> scalability and single-point of failure issues. (2) would require
>>> that
>>> LOD providers take care of setting up HTTP redirects for their moved
>>> resources - no idea if anybody will do that in reality and how this
>>> can scale. (3) could help to re-locate moved resources via search
>>> engines like Sindice but not really fully automatically. (4) could at
>>> least inform a data source that certain references are broken and it
>>> could remove them.
>>>
>>> Another alternative is of course to completely leave the problem to
>>> the application developers, which means that they must consider
>>> that a
>>> referenced resource might exist or not. I am not sure about the
>>> practical consequences of that approach, especially if several data
>>> sources are involved, but I have the feeling that it is getting
>>> really
>>> complicated if one cannot rely on any kind of referential integrity.
>>>
>>> Are there any existing mechanism that can give us at least some basic
>>> feedback about the "quality" of an LOD data source? I think, the
>>> referential integrity could be such a quality property...
>>>
>>> Thanks for your input on that issue,
>>>
>>> Bernhard
>>>
>>> ______________________________________________________
>>> Research Group Multimedia Information Systems
>>> Department of Distributed and Multimedia Systems
>>> Faculty of Computer Science
>>> University of Vienna
>>>
>>> Postal Address: Liebiggasse 4/3-4, 1010 Vienna, Austria
>>> Phone: +43 1 42 77 39635 Fax: +43 1 4277 39649
>>> E-Mail: bernhard.haslhofer@univie.ac.at
>>> WWW: http://www.cs.univie.ac.at/bernhard.haslhofer
>>>
>>>
>>
>>
>
> ______________________________________________________
> Research Group Multimedia Information Systems
> Department of Distributed and Multimedia Systems
> Faculty of Computer Science
> University of Vienna
>
> Postal Address: Liebiggasse 4/3-4, 1010 Vienna, Austria
> Phone: +43 1 42 77 39635 Fax: +43 1 4277 39649
> E-Mail: bernhard.haslhofer@univie.ac.at
> WWW: http://www.cs.univie.ac.at/bernhard.haslhofer
>
Received on Thursday, 12 February 2009 08:44:00 UTC