- From: <ahogan@dcc.uchile.cl>
- Date: Fri, 25 Jul 2014 15:12:48 -0400
- To: public-lod@w3.org
On 25/07/2014 14:44, Hugh Glaser wrote: > The idea that having a robots.txt that Disallows spiders > is a “problem” for a dataset is rather bizarre. > It is of course a problem for the spider, but is clearly not a problem for a > typical consumer of the dataset. > By that measure, serious numbers of the web sites we all use on a daily > basis are problematic. <snip> I think the general interpretation of the robots in "robots.txt" is any software agent accessing the site "automatically" (versus a user manually entering a URL). If we agree on that interpretation, a robots.txt blacklist prevents applications from following links to your site. In that case, my counter-question would be: what is the benefit of publishing your content as Linked Data (with dereferenceable URIs and rich links) if you subsequently prevent machines from discovering and accessing it automatically? Essentially you are requesting that humans (somehow) have to manually enter every URI/URL for every source, which is precisely the document-centric view we're trying to get away from. Put simply, as far as I can see, a dereferenceable URI behind a robots.txt blacklist is no longer a dereferenceable URI ... at least for a respectful software agent. Linked Data behind a robots.txt blacklist is no longer Linked Data. (This is quite clear in my mind but perhaps others might disagree.) Best, Aidan
Received on Friday, 25 July 2014 19:13:12 UTC