Re: checklink: htmlhelp is forbidden

On Sun, 2004-04-04 at 08:46, MichaelJennings wrote:
> http://www.htmlhelp.com/
> HTTP Code returned: 403
> HTTP Message: Forbidden
> Actually, I think if you try the URL you'll find it is
> not only permitted, but pretty good competition.

Really? 

david@cyberman david $ pavuk -identity "W3C-checklink/3.9.2 [3.17]
libwww-perl/5.64" http://www.htmlhelp.com/

http://www.htmlhelp.com/ URL[ 1]:     1(0) of     1 
http://www.htmlhelp.com/
download: ERROR: forbidden HTTP request

Certainly seems to be forbidden to me.

I don't know why htmlhelp.com blocks the link checker, but I wouldn't be
surprised if it was something to do with the way it (the link checker)
ignores the robots exclusion standard.

david@pils:~$ tail -f /hosts/dorward.me.uk/logs/access.log | grep robot
 
... nope, doesn't request robots.txt and recursively goes into
http://dorward.me.uk/notes/ despite:

User-agent: *
Disallow: /tmp/
Disallow: /images/
Disallow: /notes/
Disallow: /lib/


-- 
David Dorward                                 <http://dorward.me.uk/>

Received on Sunday, 4 April 2004 05:39:23 UTC