W3C home > Mailing lists > Public > www-html@w3.org > June 2005

Re: Question about web spiders...

From: Peter Kupfer <peter.kupfer@sbcglobal.net>
Date: Sat, 25 Jun 2005 23:47:38 -0500
Message-ID: <42BE336A.6040007@sbcglobal.net>
To: Lachlan Hunt <lachlan.hunt@lachy.id.au>
CC: Jasper Bryant-Greene <jasper@bryant-greene.name>, www-html@w3.org

Lachlan Hunt wrote:
> Jasper Bryant-Greene wrote:

>> I'm not sure about your specific spider, but the commonly accepted way
>> to do what you describe is something like:
>> <a href="http://www.example.org/" rel="nofollow">Link</a>
> That actually does not do what its name suggests; the spider is free to 
> follow the link.  It was actually designed to indicate that the link 
> should not be counted in the page rank algorithm.
> The correct way to control the way a spider indexes your site is to use 
> robots.txt, assuming the spider in question implements it.

In a robots.txt file can you control specifically what links a spider 
will follow on a certain page, or just that it won't go to a certain 
page. I want the spider to eventually hit each subdomain, just not from 
the home page, I have it start at each subdomain index?

Or, can each subdomain have its own robots.txt.

>> That's perfectly standards compliant, and Googlebot obeys that, as well
>> as several other major spiders AFAIK.
> It is not standards compliant at all.  It's a proprietary extension that 
> just happens to pass DTD based validation.  nofollow was discussed quite 
> extensively on this list when Google introduced it and the vast majority 
> of this community rejected it.

I tried to search the archive, but didn't see it there, why was no 
follow rejected?

Thanks, again please cc to peschtra@yahoo.com as I do not know how to 
subscribe to the list.

Peter Kupfer
Received on Sunday, 26 June 2005 04:47:45 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 7 January 2015 15:06:11 UTC