- From: Shel Kaphan <sjk@amazon.com>
- Date: Mon, 19 Feb 1996 08:36:46 -0800
- To: BearHeart / Bill Weinman <bearheart@bearnet.com>
- Cc: mirsad.todorovac@fer.hr, http-wg%cuckoo.hpl.hp.com@hplb.hpl.hp.com
bearheart@bearnet.com writes: > At 09:43 am 2/19/96 +0100, Mirsad Todorovac spake: > >> It would be really nice if there were a response code (say, 405) for > >> "robot forbidden that URL." Technically, "forbidden" is already covered > >> through 403, but it would still be nice to have something more > >> descriptive. > > There is already a method of dealing with this that takes much > less traffic than responing on a url-by-url basis. > > The "robots.txt" file is described at: > > http://info.webcrawler.com/mak/projects/robots/norobots.html > > > +--------------------------------------------------------------------------+ > | BearHeart / Bill Weinman | BearHeart@bearnet.com | http://www.bearnet.com/ > | Author of The CGI Book -- http://www.bearnet.com/cgibook/ > This gives robots a way of detecting what pages a server would like to present to the robot, but it doesn't give server scripts an indication of when they are being probed by a robot. Right now the only way to detect that a requestor is a robot is by string matching on the user-agent header. Some applications would generate pages differently if they are being probed by a robot. For instance, in applications that use URL encoding of session information (which will be with us until cookies take over completely) it might be preferable not to generate session ids, or at least not new ones, for robots. So, I'd like to propose that robots be allowed to identify themselves as such by including a simple header line in requests, which ought to be passed along to CGI programs. The header could just be "robot: true" or something like that. Since this is a form of content negotiation, some use of an accept header would also be OK, but I don't know which one to suggest. --Shel
Received on Monday, 19 February 1996 08:42:54 UTC