- From: Nathan <nathan@webr3.org>
- Date: Fri, 05 Nov 2010 10:05:36 +0000
- To: Leigh Dodds <leigh.dodds@talis.com>
- CC: Harry Halpin <hhalpin@ibiblio.org>, Ian Davis <me@iandavis.com>, public-lod@w3.org, Doug Schepers <schepers@w3.org>
Leigh Dodds wrote: > Hi Nathan, > > On 4 November 2010 18:08, Nathan <nathan@webr3.org> wrote: >> You see it's not about what we say, it's about what other say, and if 10 >> huge corps analyse the web and spit out billions of triples saying >> that anything 200 OK'd is a document, then at the end when we consider >> the RDF graph of triples, all we're going to see is one statement saying >> something is a "nonInformationResource" and a hundred others saying it's >> a document and describing what it's about together with it's format and >> so on. > > Are you suggesting that Linked Data crawlers could/should look at the > status code and use that to infer new statements about the resources > returned? If so, I think that's the first time I've seen that > mentioned, and am curious as to why someone would do it. Surely all of > the useful information is in the data itself. Not at all, I'm saying that if big-corp makes a /web crawler/ that describes what documents are about and publishes RDF triples, then if you use 200 OK, throughout the web you'll get (statements similar to) the following asserted: </toucan> :primaryTopic dbpedia:Toucan ; a :Document . Now, move down the line a couple of years and reason over the a triple dump of the web-of-data and you'll find the problem, way to solve the problem is to first strip everything that's a :Document, so all the slash URIs will be stripped, including the </toucan>. I'm also saying that 303 doesn't solve this half the time either, because most HTTP clients blackbox the process, so their process is: uri = "/toucan"; doc = get( uri ); makeStatements( uri , doc ); Again, same problem. Best, Nathan
Received on Friday, 5 November 2010 10:06:48 UTC