Re: Making Linkchecker working multithreaded

Hi Dom, qa-dev,

On 22 Jun 2005, at 22:56, Dominique Hazaël-Massieux wrote:
> I've had a quick look at the linkchecker to see what would be  
> needed to
> make it multithreaded; I see the linkchecker is using LWP::RobotUA.  
> Has
> any thought being put to use LWP::Parallel::RobotUA [1] instead?

This is a really good idea, thanks! As Ville said, some such ideas  
have been thrown around with the same goal, but our best bet so far  
was to try and have parallel RobotUA instances, which would have been  
problematic in many ways.

This looks promising, as it would certainly remove some of the  
implementation concerns. Instead of having to track everything  
ourselves, it seems that this LWP::Parallel::RobotUA can be given, at  
any time new documents to process (by 'registering' new requests),  
then you wait for some time, and fetch the results.

In particular I like these two options:
$ua->max_hosts ( $max )
     Changes the maximum number of locations accessed in parallel.  
The default value is 7.

$ua->max_req ( $max )
   Changes the maximum number of requests issued per host in  
parallel. The default value is 5.

I think this means we could greatly improve the speed of the link  
checker by setting the latter to 1, and the former to... something  
reasonably high.

Definitely worth playing with.

-- 

olivier

Received on Monday, 27 June 2005 04:30:06 UTC