- From: Bernhard Haslhofer <bernhard.haslhofer@univie.ac.at>
- Date: Thu, 5 Feb 2009 16:35:35 +0100
- To: public-lod@w3.org
Hi all, we are currently working on the question how to deal with broken links/ references between resources in (distinct) LOD data sets and would like to know your opinion on that issue. If there is some work going on into this direction, please let me know. I think I do not really need to explain the problem. Everybody knows it from the "human" Web when you follow a link and you get an annoying 404 response. If we assume that the consumers of LOD data are not humans but applications, broken links/references are not only "annoying" but could lead to severe processing errors if an application relies on a kind of "referential integrity". Assume we have an LOD data source X exposing resources that describe images and these images are linked with resources in DBPedia (e.g., http://dbpedia.org/resource/Berlin) . An application built on-top of X follows links to retrieve the geo- coordinates in order to display the images on a virtual map. If now, for some reason, the URL of the linked DB-Pedia resource changes either because DBPedia is moved or re-organized, which I guess could happen to any LOD source in a long-term perspective, the application might crash if doesn't consider that referenced resources might move or become unavailable. I know that "cool URIs don't change" but I am not sure if this assumption holds in practice, especially in a long-term perspective. For the "human" Web several solutions have been proposed, e.g., 1.) PURL and DOI services for translating URNs into resolvable URLs 2.) forward references 3.) robust link implementations, i.e., with each link you keep a set of related search terms to retrieve moved / changed resources 4.) observer / notification mechanisms X.) ? I guess (1) is not really applicable for LOD resources because of scalability and single-point of failure issues. (2) would require that LOD providers take care of setting up HTTP redirects for their moved resources - no idea if anybody will do that in reality and how this can scale. (3) could help to re-locate moved resources via search engines like Sindice but not really fully automatically. (4) could at least inform a data source that certain references are broken and it could remove them. Another alternative is of course to completely leave the problem to the application developers, which means that they must consider that a referenced resource might exist or not. I am not sure about the practical consequences of that approach, especially if several data sources are involved, but I have the feeling that it is getting really complicated if one cannot rely on any kind of referential integrity. Are there any existing mechanism that can give us at least some basic feedback about the "quality" of an LOD data source? I think, the referential integrity could be such a quality property... Thanks for your input on that issue, Bernhard ______________________________________________________ Research Group Multimedia Information Systems Department of Distributed and Multimedia Systems Faculty of Computer Science University of Vienna Postal Address: Liebiggasse 4/3-4, 1010 Vienna, Austria Phone: +43 1 42 77 39635 Fax: +43 1 4277 39649 E-Mail: bernhard.haslhofer@univie.ac.at WWW: http://www.cs.univie.ac.at/bernhard.haslhofer
Received on Thursday, 5 February 2009 15:36:11 UTC