Inferring data from network interactions (was Re: Is 303 really necessary?)

Hi,

On 5 November 2010 09:54, William Waites <ww@styx.org> wrote:
> On Fri, Nov 05, 2010 at 09:34:43AM +0000, Leigh Dodds wrote:
>>
>> Are you suggesting that Linked Data crawlers could/should look at the
>> status code and use that to infer new statements about the resources
>> returned? If so, I think that's the first time I've seen that
>> mentioned, and am curious as to why someone would do it. Surely all of
>> the useful information is in the data itself.
>
> Provenance and debugging. It would be quite possible to
> record the fact that this set of triples, G, were obtained
> by dereferencing this uri N, at a certain time, from a
> certain place, with a request that looked like this and a
> response that had these headers and response code. The
> class of information that is kept for [0]. If N appeared
> in G, that could lead directly to inferences involving the
> provenance information. If later reasoning is concerned at
> all with the trustworthiness or up-to-dateness of the
> data it could look at this as well.

Yes, I've done something similar to that in the past when I added
support for the ScutterVocab [1] to my crawler

It was the suggestion that inferring information directly from 200/303
that I was most curious about. I've argued for inferring data from 301
in the past [2], but wasn't sure of merit of introducing data based on
the other interactions

> Keeping this quantity of information around might quickly
> turn out to be too data-intensive to be practical, but
> that's more of an engineering question. I think it does
> make some sense to do this in principle at least.

That's what I found when crawling the BBC pages. Huge amounts of data
and overhead in storing it. Capturing just enough to gather statistics
on the crawl was sufficient.

Cheers,

L.

[1]. http://wiki.foaf-project.org/w/ScutterVocab
[2]. http://www.ldodds.com/blog/2007/03/the-semantics-of-301-moved-permanently/

-- 
Leigh Dodds
Programme Manager, Talis Platform
Talis
leigh.dodds@talis.com
http://www.talis.com

Received on Friday, 5 November 2010 10:04:19 UTC