Re: Question about web spiders... from Peter Kupfer on 2005-06-26 (www-html@w3.org from June 2005)

From: Peter Kupfer <peter.kupfer@sbcglobal.net>
Date: Sat, 25 Jun 2005 23:47:38 -0500
To: Lachlan Hunt <lachlan.hunt@lachy.id.au>
CC: Jasper Bryant-Greene <jasper@bryant-greene.name>, www-html@w3.org
Message-ID: <42BE336A.6040007@sbcglobal.net>

Lachlan Hunt wrote:
> Jasper Bryant-Greene wrote:
> 

>>
>> I'm not sure about your specific spider, but the commonly accepted way
>> to do what you describe is something like:
>>
>> <a href="http://www.example.org/" rel="nofollow">Link</a>
> 
> 
> That actually does not do what its name suggests; the spider is free to 
> follow the link.  It was actually designed to indicate that the link 
> should not be counted in the page rank algorithm.
> 
> The correct way to control the way a spider indexes your site is to use 
> robots.txt, assuming the spider in question implements it.

In a robots.txt file can you control specifically what links a spider 
will follow on a certain page, or just that it won't go to a certain 
page. I want the spider to eventually hit each subdomain, just not from 
the home page, I have it start at each subdomain index?

Or, can each subdomain have its own robots.txt.

>> That's perfectly standards compliant, and Googlebot obeys that, as well
>> as several other major spiders AFAIK.
> 
> 
> It is not standards compliant at all.  It's a proprietary extension that 
> just happens to pass DTD based validation.  nofollow was discussed quite 
> extensively on this list when Google introduced it and the vast majority 
> of this community rejected it.

I tried to search the archive, but didn't see it there, why was no 
follow rejected?

Thanks, again please cc to peschtra@yahoo.com as I do not know how to 
subscribe to the list.

-- 
Peter Kupfer
peschtra@yahoo.com

Received on Sunday, 26 June 2005 04:47:45 UTC