W3C home > Mailing lists > Public > www-validator@w3.org > May 2005

Re: checklink: Make Checklink Faster

From: Ville Skyttä <ville.skytta@iki.fi>
Date: Fri, 27 May 2005 23:36:57 +0300
To: www-validator@w3.org
Message-Id: <1117226217.25731.124.camel@bobcat.mine.nu>

On Fri, 2005-05-27 at 09:36 -0500, Tim Burkhart wrote:
> Is there any way to make the Link Checker faster? I set the sleep_time 
> variable to 0.

That's equivalent to setting it to 1; you should see a warning about 1
second being used instead of zero on the console if you use the command
line version, or in your web server's error log if you use the online
version.

> I am checking over 7000 links, and it takes 2.5 hours to 
> complete 4000 documents, but it shouldn't take so long. Does anybody 
> know off-hand of any speed improvements?

To some extent, the link checker is intentionally slow(ish).  This is to
make it friendlier towards target servers.  And that's currently
implemented by pausing one or more seconds between hits to a target
server.

7000 * 1 second equals roughly 117 minutes sleep time, so a total of two
and a half hours is not actually _that_ bad a result.  I'm assuming you
completed checking 7000 links in 2.5 hours, but I'm not entirely sure
what you mean by 7000 links and 4000 documents.  Given that, without
changing the code and how the link checker currently works, there would
be "only" a bit over half an hour available for local optimization in
this scenario.

Some ideas to make the link checker somewhat faster in the future
include for example implementing multiple parallel checking agents, and
ordering links to be checked for maximum HTTP Keep-Alive utilization,
but these haven't been implemented yet nor is there really a plan about
when/if they will be implemented.  The intent to not cause significant
load on the target servers is there to stay though, so these probably
won't result in order-of-magnitude speed improvements.  Or at least the
improvements will vary quite a bit between sets of links to be checked.
Received on Friday, 27 May 2005 20:36:25 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 25 April 2012 12:14:19 GMT