Re: Indexing extension from Nick Arnett on 1995-05-28 (ietf-http-wg@w3.org from April to June 1995)

From: Nick Arnett <narnett@verity.com>
Date: Sun, 28 May 1995 10:29:15 -0700
To: http-wg%cuckoo.hpl.hp.com@hplb.hpl.hp.com
Cc: harvest-dvl@cs.colorado.edu, naic@nasa.gov, webmasters@nasa.gov
Message-Id: <abee613702021004fb2d@[192.187.143.12]>

It seems to me that a solution might lie in clever use of or extensions to
the robots.txt exclusion file that most spiders respect.  See

http://web.nexor.co.uk/mak/doc/robots/robots.html

if you're not familiar with this.

Our search engine can hide the existence of inaccessible documents from the
user; I would assume, though I'm not certain, that others can do so as
well.  For example, you could intercept our CGI data (between the Web
daemon and our search daemon) to delete the security restriction for
queries coming from NASA sites.

I'd be interested in hearing from others who are using Harvest.

Nick

Received on Sunday, 28 May 1995 10:34:30 UTC