- From: Lachlan Hunt <lachlan.hunt@lachy.id.au>
- Date: Sun, 26 Jun 2005 18:08:27 +1000
- To: Peter Kupfer <peter.kupfer@sbcglobal.net>
- CC: www-html@w3.org
Peter Kupfer wrote: > Lachlan Hunt wrote: >> The correct way to control the way a spider indexes your site is to >> use robots.txt, assuming the spider in question implements it. > > In a robots.txt file can you control specifically what links a spider > will follow on a certain page, No, it controls which pages on a server the spider can access. > or just that it won't go to a certain page. Essentially, yes. > I want the spider to eventually hit each subdomain, just not from > the home page, I have it start at each subdomain index? Then HTML is the wrong place to specify such behaviour and robots.txt is probaly not suitable for you either. HTML is designed to markup the semantics of the document's content by saying *what* the content is, not describe how the content should be processed by a particular UA. Having said that though, processing instructions [1] are designed to supply system specific information, but I don't know how suitable they would be for your particular needs. I don't understand why it matters which path is followed to reach subdomains, but I think you need to find a way to configure the robot itself, not try to give it instructions from within the documents it reads. > Or, can each subdomain have its own robots.txt. Yes, AFAIK, spiders look for robots.txt in the root directory of every domain, regardless of whether it's a top-level domain or subdomain. eg. http://example.com/robots.txt http://subdomain.example.com/robots.txt In any case, this is completely off topic for this HTML related list. >> nofollow was discussed quite extensively on this list when Google >> introduced it and the vast majority of this community rejected it. > > I tried to search the archive, but didn't see it there, why was no > follow rejected? Then you didn't look very hard. A search for "nofollow" in the archives reveals most of the thread, appearing just below the messages from this thread. For your convenience, it actually started with a message on www-html-editor [2|3], with most of the followup discussion on www-html [4]. [1] http://www.is-thought.co.uk/book/sgml-8.htm#PI [2] http://lists.w3.org/Archives/Public/www-html-editor/2005JanMar/0010 [3] http://lists.w3.org/Archives/Public/www-html-editor/2005JanMar/thread#10 [4] http://lists.w3.org/Archives/Public/www-html/2005Jan/thread#64 -- Lachlan Hunt http://lachy.id.au/
Received on Sunday, 26 June 2005 08:08:41 UTC