Re: Making Linkchecker working multithreaded

On 22 Jun 2005, at 22:56, Dominique Hazaël-Massieux wrote:
> I've had a quick look at the linkchecker to see what would be needed to
> make it multithreaded; I see the linkchecker is using LWP::RobotUA. Has
> any thought being put to use LWP::Parallel::RobotUA [1] instead? 

This is a really good idea, thanks! As Ville said, some such ideas have
been thrown around with the same goal, but our best bet so far was to
try and have parallel RobotUA instances, which would have been
problematic in many ways. 

This looks promising, as it would certainly remove some of the
implementation concerns. Instead of having to track everything
ourselves, it seems that this LWP::Parallel::RobotUA can be given, at
any time new documents to process (by 'registering' new requests), then
you wait for some time, and fetch the results.

In particular I like these two options:
$ua->max_hosts ( $max )
    Changes the maximum number of locations accessed in parallel. The
default value is 7.

$ua->max_req ( $max )
  Changes the maximum number of requests issued per host in parallel.
The default value is 5.

I think this means we could greatly improve the speed of the link
checker by setting the latter to 1, and the former to... something
reasonably high.

Definitely worth playing with.
-- 
olivier

Received on Monday, 27 June 2005 04:29:43 UTC