W3C home > Mailing lists > Public > www-validator@w3.org > July 2004

Re: How to kill the link checker

From: Ville Skyttä <ville.skytta@iki.fi>
Date: Thu, 08 Jul 2004 01:13:18 +0300
To: www-validator@w3.org
Cc: Jonathan Berry <jberry@islandnet.com>
Message-Id: <1089238398.31544.327.camel@bobcat.mine.nu>

On Tue, 2004-07-06 at 18:13, Jonathan Berry wrote:

> The following test file (containing one (1) link), see
> below, will kill the Link Checker.  After about 1 minute 
> and 42 seconds, the browser comes back with a message
> that the connection has been broken by the remote server.
> 
> I believe that the correct action would be for the 
> link to be marked as questionable.  The same link in the midst
> of a larger file will cause the same action and prevent
> further processing of the file.

Agreed, and reproduced here.

> Checking link https://www.tx.preschoicefinancial.com/english/servlet/SignOn
> HEAD https://www.tx.preschoicefinancial.com/english/servlet/SignOn

In my local copy, the reason for this behaviour is that Apache's
internal timeout kicks in because it does not receive anything from the
checklink CGI script and bluntly aborts the response.

Why this happens is another story.  We do set a timeout (of 60 seconds),
but for some reason it does not apply to this particular request made by
the link checker.  I traced it somewhat, and the "hanging" seems to
happen deep inside Perl's low level connect() call.

This can be fixed by placing an alarm() call somewhere and handling it. 
But this is somewhat outside my "expertise", so I'm not sure where that
"somewhere" should really be.  I'm pretty sure it should not have to be
in the link checker code, but perhaps libwww-perl, or even IO::Socket.

Opinions, clues, anyone?
Received on Wednesday, 7 July 2004 18:13:19 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 25 April 2012 12:14:14 GMT