Re: checklink: Link checker unable to read working urls

From: Ville Skyttä <ville.skytta@iki.fi>
Date: Mon, 27 Sep 2004 21:19:32 +0300
To: www-validator@w3.org
Message-Id: <1096309172.25052.63.camel@bobcat.mine.nu>

On Sun, 2004-09-19 at 02:52, J. Grant wrote:

> How bizarre.  Seems to be an MS-Windows 2000 box.  Can you tell which
> vendor the httpd server is from?  Perhaps we can give them a bug report.

The server does not currently tell, but netcraft remembers something:

> >>http://developers.slashdot.org/article.pl?sid=03/02/23/1939225&mode=thread&tid=156
> >>http://slashdot.org/article.pl?sid=04/08/18/2257257&tid=126&tid=1
> >>http://en.wikipedia.org/wiki/Vorbis
> > 
> > Slashdot has a robots.txt that prohibes access (though I would expect
> > checklink to tell you that so there might be something else going on),

slashdot.org seems to have special rules for requests coming from
validator.w3.org.  This returns "200 OK" from everywhere else I've tried
from, but "403 Forbidden" from validator.w3.org:

  $ telnet slashdot.org 80
  HEAD / HTTP/1.0
  Host: slashdot.org

> > wikipedia.org blocks checklink,
> I wonder why they block checklink?

Actually, they seem to block everything that contains the string
"libwww-perl" in the User Agent string.

>   Is there a way to change the
> checklink user-agent from the web page so I can check as though I am
> using Mozilla?

