HTTP 4XX and 5XX error documents in RDF? (was: Re: Broken Links in LOD Data Sets)

On 14 Feb 2009, at 19:27, Kingsley Idehen wrote:
> We'll deal with it.
>
> It can 404 or smartly do something like:
> http://dbpedia.org/sparql?default-graph-uri=http%3A%2F%2Fdbpedia.org&should-sponge=&query=select+distinct+*+where+ 
> {%3Fs+%3Fp+%3Fo.+%3Fo+bif%3Acontains+%22Esperanta%22}&format=text 
> %2Fhtml&debug=on
>
> Make a suggestion doc on the fly.

Neat.

I think sending the 404 code is the right way to go, 200 seems just  
wrong. But in HTTP you get to serve an error document alongside with  
the error status code, and it could make sense to include something  
based on the query above in that error document.

(In general, this is an interesting question that hasn't been  
discussed much yet -- what should error documents look like when RDF  
is requested? This came up recently in relation to Ivan's RDFa  
Distiller webservice and how it should treat documents that cannot be  
parsed as RDFa. My intuition is that we should somehow express the  
error in RDF, and include information in the RDF that would  
potentially allow the RDF client to recover from the error, e.g. in  
the case above by fetching other resources that mention "Esperanta" or  
showing the list to the user. Or at least the RDF client could give  
feedback to the end user that is more informative than simply  
“Something went wrong while fetching”... Many of the apps built by  
this community do HTTP calls to remote sites, which in turn do calls  
to other remote sites, and if something breaks down anywhere in this  
cascade, then the user experience is usually pretty bad. Things just  
fail silently, or something generic like “Couldn't fetch any data from  
http://...” is reported. In the mid-term, we should do better than  
that. This is not an urgent issue yet, we have some tougher issues to  
fix first, but it's something to consider for the future.)

Richard



>
>
>>
>> Richard
>>
>>
>>>
>>> So either way, in LOD sites of the sort that have DBs or KBs  
>>> behind them,
>>> either it is not possible to get a 404 (dbpedia), or you can¹t  
>>> distinguish
>>> between a rubbish URI that might have been generated and one you  
>>> want to
>>> know about.
>>> I find the idea that I might give people the expectation that I  
>>> will create
>>> triples (as your point 2) rather strange - if I knew triples I  
>>> would have
>>> served them in the first place. Of course if we consider a URI I  
>>> don't know
>>> as a request for me to go and find knowledge about it, fair  
>>> enough, but I
>>> would expect a more explicit service for that. In that sense it  
>>> would not be
>>> a "broken link".
>>> Maybe the world is different for the other RDFa etc ways of  
>>> publishing LD,
>>> but in the DB/KB world, I don't see broken incoming links as  
>>> something that
>>> can be usefully dealt with, other than the maintainer checking  
>>> what is
>>> happening, as you do with a normal site.
>>> ======================================
>>>
>>> Now turning to the second possible meaning.
>>> We are concerned with the place that gave you the URI, which is  
>>> possibly
>>> more interesting. And I think this is actually the case for your TAG
>>> example.
>>> If I gave you (by which I mean an agent) such a link and you  
>>> discovered it
>>> was broken, it would be helpful to me and the LOD world if you  
>>> could tell me
>>> about it, so I could fix it. In fact it would also be helpful if  
>>> you had a
>>> suggestion as to the fix (ie a better URI), which is not out of the
>>> question. And if I trust you (when we understand what that means),  
>>> I might
>>> even do a replacement or some equivalent triples without further
>>> intervention.
>>>
>>> ======================================
>>> In the case of our RKB system, we actually do something like this  
>>> already.
>>> If we find that there is nothing about a URI in the KB that should  
>>> have it,
>>> we don't immediately return 404, but look it up in the associated  
>>> CRS
>>> (coreference service), and possibly others, to see if there is an  
>>> equivalent
>>> URI in the same KB that could be used (we do not return RDF from  
>>> other KB,
>>> although we could). So if you try to resolve
>>> http://southampton.rkbexplorer.com/description/person-07113
>>> You actually get the data for
>>> http://southampton.rkbexplorer.com/id/person-0a36cf76d1a3e99f9267ce3d0b95e42
>>> e-06999d58799cb8a3a55d3c69efcc9ba6 and a message telling you to  
>>> use the new
>>> one next time.
>>> (I'm not sure we have got the RDF perfectly right, but that is the  
>>> idea.)
>>> In effect, if we are asked for a broken link, we have a quick look  
>>> around to
>>> see if there is anything we do know, and give that back.
>>> Of course, the CRS also gives the requestor the chance to do the  
>>> same fixing
>>> up.
>>> The reason that there might be a URI in the KB that has no  
>>> triples, but we
>>> know about, is because we "deprecate" URIs to reduce the number,  
>>> and then
>>> use the CRS to resolve from deprecated to non-deprecated.
>>> So a deprecated URI is one we used to know about, and may still be  
>>> being
>>> used "out there", but don't want to continue to use - sort of a  
>>> broken link.
>>> Hence our dynamic broken link fixing.
>>>
>>> Best
>>> Hugh
>>>
>>> PS.
>>> My choice of http://dbpedia.org/data/Esperanta.rdf as a  
>>> misspelling of
>>> http://dbpedia.org/data/Esperanto.rdf turned out to be fascinating.
>>> It turns out that wikipedia tells me that there used to be a page
>>> http://en.wikipedia.org/wiki/Esperanta, but it has been deleted.
>>> So what is returned is different from
>>> http://en.wikipedia.org/wiki/Esperanti.
>>> Although http://dbpedia.org/data/Esperanta.rdf and
>>> http://dbpedia.org/data/Esperanti.rdf both return empty RDF  
>>> documents, I
>>> think.
>>> It looks to me that this is trying to solve a similar problem to  
>>> that which
>>> our deprecated URIs is doing in our CRS.
>>>
>>>
>>> On 14/02/2009 14:06, "Hausenblas, Michael" <michael.hausenblas@deri.org 
>>> >
>>> wrote:
>>>
>>>> Kingsley,
>>>>
>>>> Grounding in 404 and 30x makes sense to me. However I am still in  
>>>> the
>>>> conception phase ;)
>>>>
>>>> Sent from my iPhone
>>>>
>>>> On 12 Feb 2009, at 14:02, "Kingsley Idehen"  
>>>> <kidehen@openlinksw.com> wrote:
>>>>
>>>>> Michael Hausenblas wrote:
>>>>>> Bernhard, All,
>>>>>>
>>>>>> So, another take on how to deal with broken links: couple of  
>>>>>> days ago I
>>>>>> reported two broken links in a TAG finding [1] which was  
>>>>>> (quickly and
>>>>>> pragmatically, bravo, TG!) addressed [2], recently.
>>>>>>
>>>>>> Let's abstract this away and apply to data rather than  
>>>>>> documents. The
>>>>>> mechanism could work as follows:
>>>>>>
>>>>>> 1. A *human* (e.g. Through a built-in feature in a Web of Data  
>>>>>> browser such
>>>>>> as Tabulator) encounters a broken link an reports it to the  
>>>>>> respective
>>>>>> dataset publisher (the authoritative one who 'owns' it)
>>>>>>
>>>>>> OR
>>>>>>
>>>>>> 1. A machine encounters a broken link (should it then directly  
>>>>>> ping the
>>>>>> dataset publisher or first 'ask' its master for permission?)
>>>>>>
>>>>>> 2. The dataset publisher acknowledges the broken link and  
>>>>>> creates according
>>>>>> triples as done in the case for documents (cf. [2])
>>>>>>
>>>>>> In case anyone wants to pick that up, I'm happy to contribute.  
>>>>>> The name?
>>>>>> Well, a straw-man proposal could be called *re*pairing  
>>>>>> *vi*ntage link
>>>>>> *val*ues (REVIVAL) - anyone? :)
>>>>>>
>>>>>> Cheers,
>>>>>>     Michael
>>>>>>
>>>>>> [1] http://lists.w3.org/Archives/Public/www-tag/2009Jan/0118.html
>>>>>> <http://lists.w3.org/Archives/Public/www-tag/2009Jan/0118.html>
>>>>>> [2] http://lists.w3.org/Archives/Public/www-tag/2009Feb/0068.html
>>>>>> <http://lists.w3.org/Archives/Public/www-tag/2009Feb/0068.html>
>>>>>>
>>>>>>
>>>>> Micheal,
>>>>>
>>>>> If the publisher is truly dog-fooding and they know what data  
>>>>> objects
>>>>> they are publishing, condition 404 should be the trigger for a  
>>>>> self
>>>>> directed query to determine:
>>>>>
>>>>> 1. what's happened to the entity URI
>>>>> 2. lookup similar entities
>>>>> 3. then self fix if possible (e.g. a 302)
>>>>>
>>>>> Basically, Linked Data publishers should make 404s another  
>>>>> Linked Data
>>>>> prowess exploitation point  :-)
>>>>>
>>>>>
>>>>> -- 
>>>>>
>>>>>
>>>>> Regards,
>>>>>
>>>>> Kingsley Idehen       Weblog: http://www.openlinksw.com/blog/ 
>>>>> ~kidehen
>>>>> <http://www.openlinksw.com/blog/~kidehen>
>>>>> President & CEO
>>>>> OpenLink Software     Web: http://www.openlinksw.com
>>>>> <http://www.openlinksw.com>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>
>>
>
>
> -- 
>
>
> Regards,
>
> Kingsley Idehen	      Weblog: http://www.openlinksw.com/blog/~kidehen
> President & CEO OpenLink Software     Web: http://www.openlinksw.com
>
>
>
>
>

Received on Sunday, 15 February 2009 00:55:58 UTC