W3C home > Mailing lists > Public > www-validator@w3.org > April 2004

Re: checklink: htmlhelp is forbidden

From: David Dorward <david@dorward.me.uk>
Date: Sun, 04 Apr 2004 10:35:20 +0100
To: MichaelJennings <mikejenn@fiam.net>
Cc: www-validator@w3.org
Message-Id: <1081071319.13292.5.camel@localhost>

On Sun, 2004-04-04 at 08:46, MichaelJennings wrote:
> http://www.htmlhelp.com/
> HTTP Code returned: 403
> HTTP Message: Forbidden
> Actually, I think if you try the URL you'll find it is
> not only permitted, but pretty good competition.

Really? 

david@cyberman david $ pavuk -identity "W3C-checklink/3.9.2 [3.17]
libwww-perl/5.64" http://www.htmlhelp.com/

http://www.htmlhelp.com/ URL[ 1]:     1(0) of     1 
http://www.htmlhelp.com/
download: ERROR: forbidden HTTP request

Certainly seems to be forbidden to me.

I don't know why htmlhelp.com blocks the link checker, but I wouldn't be
surprised if it was something to do with the way it (the link checker)
ignores the robots exclusion standard.

david@pils:~$ tail -f /hosts/dorward.me.uk/logs/access.log | grep robot
 
... nope, doesn't request robots.txt and recursively goes into
http://dorward.me.uk/notes/ despite:

User-agent: *
Disallow: /tmp/
Disallow: /images/
Disallow: /notes/
Disallow: /lib/


-- 
David Dorward                                 <http://dorward.me.uk/>
Received on Sunday, 4 April 2004 05:39:23 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 25 April 2012 12:14:13 GMT