Re: New response code

bearheart@bearnet.com writes:
 > At 09:43 am 2/19/96 +0100, Mirsad Todorovac spake:
 > >> It would be really nice if there were a response code (say, 405) for
 > >> "robot forbidden that URL."  Technically, "forbidden" is already covered
 > >> through 403, but it would still be nice to have something more
 > >> descriptive.
 > 
 >    There is already a method of dealing with this that takes much 
 > less traffic than responing on a url-by-url basis. 
 > 
 >    The "robots.txt" file is described at: 
 > 
 >       http://info.webcrawler.com/mak/projects/robots/norobots.html
 > 
 > 
 > +--------------------------------------------------------------------------+
 > | BearHeart / Bill Weinman | BearHeart@bearnet.com | http://www.bearnet.com/ 
 > | Author of The CGI Book -- http://www.bearnet.com/cgibook/ 
 > 


This gives robots a way of detecting what pages a server would like to
present to the robot, but it doesn't give server scripts an indication of
when they are being probed by a robot.  Right now the only way to
detect that a requestor is a robot is by string matching on the
user-agent header.

Some applications would generate pages differently if they are being
probed by a robot.  For instance, in applications that use URL
encoding of session information (which will be with us until cookies
take over completely)  it might be preferable not to generate session
ids, or at least not new ones, for robots.

So, I'd like to propose that robots be allowed to identify themselves
as such by including a simple header line in requests, which ought to
be passed along to CGI programs.  The header could just be "robot: true"
or something like that.  Since this is a form of content negotiation,
some use of an accept header would also be OK, but I don't know which
one to suggest.

--Shel

Received on Monday, 19 February 1996 08:42:54 UTC