- From: Richard Cyganiak <richard@cyganiak.de>
- Date: Sat, 14 Feb 2009 19:05:30 +0000
- To: Hugh Glaser <hg@ecs.soton.ac.uk>
- Cc: "Hausenblas, Michael" <michael.hausenblas@deri.org>, Kingsley Idehen <kidehen@openlinksw.com>, Bernhard Haslhofer <bernhard.haslhofer@univie.ac.at>, Linked Data community <public-lod@w3.org>
On 14 Feb 2009, at 15:59, Hugh Glaser wrote: > Now I think about it, I have checked what dbpedia does to > http://dbpedia.org/resource/Esperanta it does the blank doc thing. > (I guess we need to work out what is best practice for this and then > add it > to the How to Publish? I think my view is that something like > http://dbpedia.org/data/Esperanta.rdf should 404.) FWIW, DBpedia does a bit of 404ing: http://dbpedia.org/page/Esperanta is an empty HTML document http://dbpedia.org/data/Esperanta is 404 http://dbpedia.org/data/Esperanta.rdf is an empty RDF document These should all 404, and at least the first one used to on the previous incarnation of the DBpedia server software. Richard > > So either way, in LOD sites of the sort that have DBs or KBs behind > them, > either it is not possible to get a 404 (dbpedia), or you canšt > distinguish > between a rubbish URI that might have been generated and one you > want to > know about. > I find the idea that I might give people the expectation that I will > create > triples (as your point 2) rather strange - if I knew triples I would > have > served them in the first place. Of course if we consider a URI I > don't know > as a request for me to go and find knowledge about it, fair enough, > but I > would expect a more explicit service for that. In that sense it > would not be > a "broken link". > Maybe the world is different for the other RDFa etc ways of > publishing LD, > but in the DB/KB world, I don't see broken incoming links as > something that > can be usefully dealt with, other than the maintainer checking what is > happening, as you do with a normal site. > ====================================== > > Now turning to the second possible meaning. > We are concerned with the place that gave you the URI, which is > possibly > more interesting. And I think this is actually the case for your TAG > example. > If I gave you (by which I mean an agent) such a link and you > discovered it > was broken, it would be helpful to me and the LOD world if you could > tell me > about it, so I could fix it. In fact it would also be helpful if you > had a > suggestion as to the fix (ie a better URI), which is not out of the > question. And if I trust you (when we understand what that means), I > might > even do a replacement or some equivalent triples without further > intervention. > > ====================================== > In the case of our RKB system, we actually do something like this > already. > If we find that there is nothing about a URI in the KB that should > have it, > we don't immediately return 404, but look it up in the associated CRS > (coreference service), and possibly others, to see if there is an > equivalent > URI in the same KB that could be used (we do not return RDF from > other KB, > although we could). So if you try to resolve > http://southampton.rkbexplorer.com/description/person-07113 > You actually get the data for > http://southampton.rkbexplorer.com/id/person-0a36cf76d1a3e99f9267ce3d0b95e42 > e-06999d58799cb8a3a55d3c69efcc9ba6 and a message telling you to use > the new > one next time. > (I'm not sure we have got the RDF perfectly right, but that is the > idea.) > In effect, if we are asked for a broken link, we have a quick look > around to > see if there is anything we do know, and give that back. > Of course, the CRS also gives the requestor the chance to do the > same fixing > up. > The reason that there might be a URI in the KB that has no triples, > but we > know about, is because we "deprecate" URIs to reduce the number, and > then > use the CRS to resolve from deprecated to non-deprecated. > So a deprecated URI is one we used to know about, and may still be > being > used "out there", but don't want to continue to use - sort of a > broken link. > Hence our dynamic broken link fixing. > > Best > Hugh > > PS. > My choice of http://dbpedia.org/data/Esperanta.rdf as a misspelling of > http://dbpedia.org/data/Esperanto.rdf turned out to be fascinating. > It turns out that wikipedia tells me that there used to be a page > http://en.wikipedia.org/wiki/Esperanta, but it has been deleted. > So what is returned is different from > http://en.wikipedia.org/wiki/Esperanti. > Although http://dbpedia.org/data/Esperanta.rdf and > http://dbpedia.org/data/Esperanti.rdf both return empty RDF > documents, I > think. > It looks to me that this is trying to solve a similar problem to > that which > our deprecated URIs is doing in our CRS. > > > On 14/02/2009 14:06, "Hausenblas, Michael" <michael.hausenblas@deri.org > > > wrote: > >> Kingsley, >> >> Grounding in 404 and 30x makes sense to me. However I am still in the >> conception phase ;) >> >> Sent from my iPhone >> >> On 12 Feb 2009, at 14:02, "Kingsley Idehen" >> <kidehen@openlinksw.com> wrote: >> >>> Michael Hausenblas wrote: >>>> Bernhard, All, >>>> >>>> So, another take on how to deal with broken links: couple of days >>>> ago I >>>> reported two broken links in a TAG finding [1] which was (quickly >>>> and >>>> pragmatically, bravo, TG!) addressed [2], recently. >>>> >>>> Let's abstract this away and apply to data rather than documents. >>>> The >>>> mechanism could work as follows: >>>> >>>> 1. A *human* (e.g. Through a built-in feature in a Web of Data >>>> browser such >>>> as Tabulator) encounters a broken link an reports it to the >>>> respective >>>> dataset publisher (the authoritative one who 'owns' it) >>>> >>>> OR >>>> >>>> 1. A machine encounters a broken link (should it then directly >>>> ping the >>>> dataset publisher or first 'ask' its master for permission?) >>>> >>>> 2. The dataset publisher acknowledges the broken link and creates >>>> according >>>> triples as done in the case for documents (cf. [2]) >>>> >>>> In case anyone wants to pick that up, I'm happy to contribute. >>>> The name? >>>> Well, a straw-man proposal could be called *re*pairing *vi*ntage >>>> link >>>> *val*ues (REVIVAL) - anyone? :) >>>> >>>> Cheers, >>>> Michael >>>> >>>> [1] http://lists.w3.org/Archives/Public/www-tag/2009Jan/0118.html >>>> <http://lists.w3.org/Archives/Public/www-tag/2009Jan/0118.html> >>>> [2] http://lists.w3.org/Archives/Public/www-tag/2009Feb/0068.html >>>> <http://lists.w3.org/Archives/Public/www-tag/2009Feb/0068.html> >>>> >>>> >>> Micheal, >>> >>> If the publisher is truly dog-fooding and they know what data >>> objects >>> they are publishing, condition 404 should be the trigger for a self >>> directed query to determine: >>> >>> 1. what's happened to the entity URI >>> 2. lookup similar entities >>> 3. then self fix if possible (e.g. a 302) >>> >>> Basically, Linked Data publishers should make 404s another Linked >>> Data >>> prowess exploitation point :-) >>> >>> >>> -- >>> >>> >>> Regards, >>> >>> Kingsley Idehen Weblog: http://www.openlinksw.com/blog/ >>> ~kidehen >>> <http://www.openlinksw.com/blog/~kidehen> >>> President & CEO >>> OpenLink Software Web: http://www.openlinksw.com >>> <http://www.openlinksw.com> >>> >>> >>> >>> >> > >
Received on Saturday, 14 February 2009 19:06:12 UTC