Re: Just what *does* robots.txt mean for a LOD site?

Hi Hugh

I think an interpretation that would make sense would be similar to the
policy of several websites APIs e.g. linkedin.

"go ahead and get my data .. as long as its in reaction to an input your
user directly generates"
e.g. a user directly wants to know about "hugh glaser",  that generates a
call to linkin API, fine.   scraping to get "all the users that" ? no way!

so a chrome plugin that for example automatically follows some linked data
in response to a user action would be ok,  a spider just following links to
get a dump would not.

this above said i am afraid this whole thing is unlikely to have any actual
impact.. if there are no aplications that would care/have any use  to do
either of the two above

Gio



On Sat, Jul 26, 2014 at 1:16 PM, Hugh Glaser <hugh@glasers.org> wrote:

> Hi.
>
> I’m pretty sure this discussion suggest that we (the LD community) should
> come try to come to some consensus of policy on exactly what it means if an
> agent finds a robots.txt on a Linked Data site.
>
> So I have changed the subject line - sorry Chris, it should have been
> changed earlier.
>
> Not an easy thing to come to, I suspect, but it seems to have become
> significant.
> Is there a more official forum for this sort of thing?
>
> On 26 Jul 2014, at 00:55, Luca Matteis <lmatteis@gmail.com> wrote:
>
> > On Sat, Jul 26, 2014 at 1:34 AM, Hugh Glaser <hugh@glasers.org> wrote:
> >> That sort of sums up what I want.
> >
> > Indeed. So I agree that robots.txt should probably not establish
> > whether something is a linked dataset or not. To me your data is still
> > linked data even though robots.txt is blocking access of specific
> > types of agents, such as crawlers.
> >
> > Aidan,
> >
> >> *) a Linked Dataset behind a robots.txt blacklist is not a Linked
> Dataset.
> >
> > Isn't that a bit harsh? That would be the case if the only type of
> > agent is a crawler. But as Hugh mentioned, linked datasets can be
> > useful simply by treating URIs as dereferenceable identifiers without
> > following links.
> In Aidan’s view (I hope I am right here), it is perfectly sensible.
> If you start from the premise that robots.txt is intended to prohibit
> access be anything other than a browser with a human at it, then only
> humans could fetch the RDF documents.
> Which means that the RDF document is completely useless as a
> machine-interpretable semantics for the resource, since it would need a
> human to do some cut and paste or something to get it into a processor.
>
> It isn’t really a question of harsh - it is perfectly logical from that
> view of robots.txt (which isn’t our view, because we think that robots.txt
> is about "specific types of agents”, as you say).
>
> Cheers
> Hugh
>
> --
> Hugh Glaser
>    20 Portchester Rise
>    Eastleigh
>    SO50 4QS
> Mobile: +44 75 9533 4155, Home: +44 23 8061 5652
>
>
>
>

Received on Saturday, 26 July 2014 13:18:50 UTC