Recognizing search engine spiders from Jacob Palme on 2001-03-03 (ietf-discuss@w3.org from March 2001)

From: Jacob Palme <jpalme@dsv.su.se>
Date: Sat, 3 Mar 2001 10:31:02 +0100
To: discuss@apps.ietf.org
Message-Id: <p05010402b6c6698120db@[130.237.161.111]>

Is there any standard which search engines use when sending
HTTP requests during spidering, in order to tell the
receipient HTTP server that they are search engines.

I can see multiple uses of this. In our particular case,
we sometimes intentionally create slightly varying URLs
of the same document, in order to stop an old version
in the cache to be used. (Yes, I know there are cache
control standards, but they do not seem to work in all
cases.) This might mean that a search engine would
store multiple copies of nearly the same document,
and would not recognize that a new version replaces an
old version of the same document.
-- 
Jacob Palme <jpalme@dsv.su.se> (Stockholm University and KTH)
for more info see URL: http://www.dsv.su.se/jpalme/

Received on Saturday, 3 March 2001 06:54:58 UTC