- From: Hausenblas, Michael <michael.hausenblas@deri.org>
- Date: Sat, 14 Feb 2009 16:32:32 +0000
- To: "Hugh Glaser" <hg@ecs.soton.ac.uk>
- Cc: "Kingsley Idehen" <kidehen@openlinksw.com>, "Bernhard Haslhofer" <bernhard.haslhofer@univie.ac.at>, "Linked Data community" <public-lod@w3.org>
Hugh, As often, you are right (with my sloppy usage of the term publisher) and I think your analysis below is indeed close to what I was thinking as well. Let's move over to ESW Wiki and write up stuff. A paste from your email might be a good start! Mind minting a URI for it and start fill in the Wiki page? I'm on travel and limited re my capabilities currently ;) Cheers, Michael Sent from my iPhone On 14 Feb 2009, at 16:00, "Hugh Glaser" <hg@ecs.soton.ac.uk> wrote: > Hi Michael. > I got thoroughly confused, I think, by your use of the "dataset > publisher > (the authoritative one who 'owns' it)". > That made me think you were talking about the owner of the broken > URI (ie, > where it should have resolved to), rather than the place that gave > you the > URI. (Which was it? :-) ) > > So the next bit is the first of those: > ====================================== > I think in a lot of the LOD world, a 404 means ³I don¹t know anythin > g about > that URI², rather than a broken link. > Certainly for us, that is all we can do. > In fact, what we are actually doing is manually generating the 404 > when we > find there is nothing in the KB; we could instead return a blankish > RDF > document, but that didn¹t seem sensible. > Now I think about it, I have checked what dbpedia does to > http://dbpedia.org/resource/Esperanta it does the blank doc thing. > (I guess we need to work out what is best practice for this and then > add it > to the How to Publish? I think my view is that something like > http://dbpedia.org/data/Esperanta.rdf should 404.) > So either way, in LOD sites of the sort that have DBs or KBs behind > them, > either it is not possible to get a 404 (dbpedia), or you can¹t disti > nguish > between a rubbish URI that might have been generated and one you > want to > know about. > I find the idea that I might give people the expectation that I will > create > triples (as your point 2) rather strange - if I knew triples I would > have > served them in the first place. Of course if we consider a URI I > don't know > as a request for me to go and find knowledge about it, fair enough, > but I > would expect a more explicit service for that. In that sense it > would not be > a "broken link". > Maybe the world is different for the other RDFa etc ways of > publishing LD, > but in the DB/KB world, I don't see broken incoming links as > something that > can be usefully dealt with, other than the maintainer checking what is > happening, as you do with a normal site. > ====================================== > > Now turning to the second possible meaning. > We are concerned with the place that gave you the URI, which is > possibly > more interesting. And I think this is actually the case for your TAG > example. > If I gave you (by which I mean an agent) such a link and you > discovered it > was broken, it would be helpful to me and the LOD world if you could > tell me > about it, so I could fix it. In fact it would also be helpful if you > had a > suggestion as to the fix (ie a better URI), which is not out of the > question. And if I trust you (when we understand what that means), I > might > even do a replacement or some equivalent triples without further > intervention. > > ====================================== > In the case of our RKB system, we actually do something like this > already. > If we find that there is nothing about a URI in the KB that should > have it, > we don't immediately return 404, but look it up in the associated CRS > (coreference service), and possibly others, to see if there is an > equivalent > URI in the same KB that could be used (we do not return RDF from > other KB, > although we could). So if you try to resolve > http://southampton.rkbexplorer.com/description/person-07113 > You actually get the data for > http://southampton.rkbexplorer.com/id/person-0a36cf76d1a3e99f9267ce3d0b95e42 > e-06999d58799cb8a3a55d3c69efcc9ba6 and a message telling you to use > the new > one next time. > (I'm not sure we have got the RDF perfectly right, but that is the > idea.) > In effect, if we are asked for a broken link, we have a quick look > around to > see if there is anything we do know, and give that back. > Of course, the CRS also gives the requestor the chance to do the > same fixing > up. > The reason that there might be a URI in the KB that has no triples, > but we > know about, is because we "deprecate" URIs to reduce the number, and > then > use the CRS to resolve from deprecated to non-deprecated. > So a deprecated URI is one we used to know about, and may still be > being > used "out there", but don't want to continue to use - sort of a > broken link. > Hence our dynamic broken link fixing. > > Best > Hugh > > PS. > My choice of http://dbpedia.org/data/Esperanta.rdf as a misspelling of > http://dbpedia.org/data/Esperanto.rdf turned out to be fascinating. > It turns out that wikipedia tells me that there used to be a page > http://en.wikipedia.org/wiki/Esperanta, but it has been deleted. > So what is returned is different from > http://en.wikipedia.org/wiki/Esperanti. > Although http://dbpedia.org/data/Esperanta.rdf and > http://dbpedia.org/data/Esperanti.rdf both return empty RDF > documents, I > think. > It looks to me that this is trying to solve a similar problem to > that which > our deprecated URIs is doing in our CRS. > > > On 14/02/2009 14:06, "Hausenblas, Michael" <michael.hausenblas@deri.org > > > wrote: > >> Kingsley, >> >> Grounding in 404 and 30x makes sense to me. However I am still in the >> conception phase ;) >> >> Sent from my iPhone >> >> On 12 Feb 2009, at 14:02, "Kingsley Idehen" >> <kidehen@openlinksw.com> wrote: >> >>> Michael Hausenblas wrote: >>>> Bernhard, All, >>>> >>>> So, another take on how to deal with broken links: couple of days >>>> ago I >>>> reported two broken links in a TAG finding [1] which was (quickly >>>> and >>>> pragmatically, bravo, TG!) addressed [2], recently. >>>> >>>> Let's abstract this away and apply to data rather than documents. >>>> The >>>> mechanism could work as follows: >>>> >>>> 1. A *human* (e.g. Through a built-in feature in a Web of Data >>>> browser such >>>> as Tabulator) encounters a broken link an reports it to the >>>> respective >>>> dataset publisher (the authoritative one who 'owns' it) >>>> >>>> OR >>>> >>>> 1. A machine encounters a broken link (should it then directly >>>> ping the >>>> dataset publisher or first 'ask' its master for permission?) >>>> >>>> 2. The dataset publisher acknowledges the broken link and creates >>>> according >>>> triples as done in the case for documents (cf. [2]) >>>> >>>> In case anyone wants to pick that up, I'm happy to contribute. >>>> The name? >>>> Well, a straw-man proposal could be called *re*pairing *vi*ntage >>>> link >>>> *val*ues (REVIVAL) - anyone? :) >>>> >>>> Cheers, >>>> Michael >>>> >>>> [1] http://lists.w3.org/Archives/Public/www-tag/2009Jan/0118.html >>>> <http://lists.w3.org/Archives/Public/www-tag/2009Jan/0118.html> >>>> [2] http://lists.w3.org/Archives/Public/www-tag/2009Feb/0068.html >>>> <http://lists.w3.org/Archives/Public/www-tag/2009Feb/0068.html> >>>> >>>> >>> Micheal, >>> >>> If the publisher is truly dog-fooding and they know what data >>> objects >>> they are publishing, condition 404 should be the trigger for a self >>> directed query to determine: >>> >>> 1. what's happened to the entity URI >>> 2. lookup similar entities >>> 3. then self fix if possible (e.g. a 302) >>> >>> Basically, Linked Data publishers should make 404s another Linked >>> Data >>> prowess exploitation point :-) >>> >>> >>> -- >>> >>> >>> Regards, >>> >>> Kingsley Idehen Weblog: http://www.openlinksw.com/blog/ >>> ~kidehen >>> <http://www.openlinksw.com/blog/~kidehen> >>> President & CEO >>> OpenLink Software Web: http://www.openlinksw.com >>> <http://www.openlinksw.com> >>> >>> >>> >>> >> >
Received on Saturday, 14 February 2009 16:33:19 UTC