Re: Broken Links in LOD Data Sets from Kingsley Idehen on 2009-02-05 (public-lod@w3.org from February 2009)

From: Kingsley Idehen <kidehen@openlinksw.com>
Date: Thu, 05 Feb 2009 16:21:31 -0500
To: Hugh Glaser <hg@ecs.soton.ac.uk>
CC: Bernhard Haslhofer <bernhard.haslhofer@univie.ac.at>, "public-lod@w3.org" <public-lod@w3.org>
Message-ID: <498B585B.50909@openlinksw.com>
On 2/5/09 2:49 PM, Hugh Glaser wrote:
> (Note to self: Ooh - a lot of interesting stuff here - must try very hard to be brief.)
>
> I think this is actually part of the "I can't host your Linked Data for you" issue.
> Once a resolvable URI has broken, for any reason, it is dead.
> The same knowledge might be out there, but the link is dead, even though all you wanted was the knowledge, and any number of people may have it and be delighted to give you it.
>    
Yes, the "hanger" is dead, the things hanging off it, well, there out 
there somewhere :-) Object IDs live forever, the vitality of values 
clustered around an object ID/URI  (using today's social media as 
anecdote) vary over time. Thus, URI preferences will be volatile and 
intrinsically linked to value they expose.

URIs are digital brands imprints.

Information is the "new black" and Small is the "new big", because 
"Shrinkage Happens!" :-)
> So you need a service to give you the information.
> For example http://service.example.com/?uri=http://deaduri.com/
> Which "resolves" a "foreign URI" in the service.example.com KB, in our terminology.
> And service-example.com might be an Internet Archive style thing.
> Actually, this is a desirable service even for non-dead URIs.
>    
Yep, because at the end of the day the referent/pointees are what 
matter, not the hanger/pointer.
> So we need to know where are the KBs that will enable you to get it.
> A specialised URI search service may have helped you find this (I call this a gurgle), and this may indeed be informed by voiD (which is why I have been so encouraging about voiD). In fact, we have an internal gurgle service at rkbexplorer, and Ian simply processed the same data to generate the voiD descriptions, which is why he was able to do them so quickly.
>    
> Some of these KB services might be quite specialised, such as services that simply give you all the sameAs they know about for the URI.
> (We call such KBs a CRS [Consistent Reference Service]).
> And then if the URI is really dead, the KB/CRS should avoid telling people about it, so we are able to "deprecate" URIs in the CRS, which I have mentioned before.
>    
Yep! This is the game, and it is about intrinsic value that a platform / 
service offers in the Linked Data realm. All of this value is unveiled 
via granular interaction with a data space via a URI (the conduit to the 
space).

> Finally:
> The fact that the URI is dead should actually be transparent at the application level.
>    
Ideally, yes. Worst case a quite rebuild (in DBMS realm indexes and 
rowids can go bad, but a good DBMS should be able to rebuild even if 
worst case is temporary down time).

> Our system has an intermediate API that simply returns the aggregated RDF found for a URI - the application level never (unless it wants) knows where that RDF came from. Of course there may be nothing comes back, but that could be true of any URI. And for anything interesting it will change every time you ask. (Similar to the Semantic Web Client Library.)
>
>    
Great stuff!

Kingsley
> Not too long I hope, given that your question encompasses almost every aspect of Linked Data.
> Hugh
>
> As the man said:
> «Je n'ai fait celle-ci plus longue que parce que je n'ai pas eu le loisir de la faire plus courte. »
> But not this time, I hope.
>
> On 05/02/2009 15:35, "Bernhard Haslhofer"<bernhard.haslhofer@univie.ac.at>  wrote:
>
>
>
> Hi all,
>
> we are currently working on the question how to deal with broken links/
> references between resources in (distinct) LOD data sets and would
> like to know your opinion on that issue. If there is some work going
> on into this direction, please let me know.
>
> I think I do not really need to explain the problem. Everybody knows
> it from the "human" Web when you follow a link and you get an annoying
> 404 response.
>
> If we assume that the consumers of LOD data are not humans but
> applications, broken links/references are not only "annoying" but
> could lead to severe processing errors if an application relies on a
> kind of "referential integrity".
>
> Assume we have an LOD data source X exposing resources that describe
> images and these images are linked with resources in DBPedia (e.g., http://dbpedia.org/resource/Berlin)
> . An application built on-top of X follows links to retrieve the geo-
> coordinates in order to display the images on a virtual map. If now,
> for some reason, the URL of the linked DB-Pedia resource changes
> either because DBPedia is moved or re-organized, which I guess could
> happen to any LOD source in a long-term perspective, the application
> might crash if doesn't consider that referenced resources might move
> or become unavailable.
>
> I know that "cool URIs don't change" but I am not sure if this
> assumption holds in practice, especially in a long-term perspective.
>
> For the "human" Web several solutions have been proposed, e.g.,
> 1.) PURL and DOI services for translating URNs into resolvable URLs
> 2.) forward references
> 3.) robust link implementations, i.e., with each link you keep a set
> of related search terms to retrieve moved / changed resources
> 4.) observer / notification mechanisms
> X.) ?
>
> I guess (1) is not really applicable for LOD resources because of
> scalability and single-point of failure issues. (2) would require that
> LOD providers take care of setting up HTTP redirects for their moved
> resources - no idea if anybody will do that in reality and how this
> can scale. (3) could help to re-locate moved resources via search
> engines like Sindice but not really fully automatically. (4) could at
> least inform a data source that certain references are broken and it
> could remove them.
>
> Another alternative is of course to completely leave the problem to
> the application developers, which means that they must consider that a
> referenced resource might exist or not. I am not sure about the
> practical consequences of that approach, especially if several data
> sources are involved, but I have the feeling that it is getting really
> complicated if one cannot rely on any kind of referential integrity.
>
> Are there any existing mechanism that can give us at least some basic
> feedback about the "quality" of an LOD data source? I think, the
> referential integrity could be such a quality property...
>
> Thanks for your input on that issue,
>
> Bernhard
>
> ______________________________________________________
> Research Group Multimedia Information Systems
> Department of Distributed and Multimedia Systems
> Faculty of Computer Science
> University of Vienna
>
> Postal Address: Liebiggasse 4/3-4, 1010 Vienna, Austria
> Phone: +43 1 42 77 39635 Fax: +43 1 4277 39649
> E-Mail: bernhard.haslhofer@univie.ac.at
> WWW: http://www.cs.univie.ac.at/bernhard.haslhofer
>
>
>
>
>
>    


-- 


Regards,

Kingsley Idehen	      Weblog: http://www.openlinksw.com/blog/~kidehen
President&  CEO
OpenLink Software     Web: http://www.openlinksw.com
Received on Thursday, 5 February 2009 21:22:18 UTC