W3C home > Mailing lists > Public > public-qa-dev@w3.org > June 2005

Re: Making Linkchecker working multithreaded

From: olivier Thereaux <ot@w3.org>
Date: Mon, 27 Jun 2005 13:30:12 +0900
Message-Id: <D980248F-75DB-4D82-B8B0-9AFA81FCDFA8@w3.org>
Cc: public-qa-dev@w3.org
To: Dominique HazaŽl-Massieux <dom@w3.org>

Hi Dom, qa-dev,

On 22 Jun 2005, at 22:56, Dominique HazaŽl-Massieux wrote:
> I've had a quick look at the linkchecker to see what would be  
> needed to
> make it multithreaded; I see the linkchecker is using LWP::RobotUA.  
> Has
> any thought being put to use LWP::Parallel::RobotUA [1] instead?

This is a really good idea, thanks! As Ville said, some such ideas  
have been thrown around with the same goal, but our best bet so far  
was to try and have parallel RobotUA instances, which would have been  
problematic in many ways.

This looks promising, as it would certainly remove some of the  
implementation concerns. Instead of having to track everything  
ourselves, it seems that this LWP::Parallel::RobotUA can be given, at  
any time new documents to process (by 'registering' new requests),  
then you wait for some time, and fetch the results.

In particular I like these two options:
$ua->max_hosts ( $max )
     Changes the maximum number of locations accessed in parallel.  
The default value is 7.

$ua->max_req ( $max )
   Changes the maximum number of requests issued per host in  
parallel. The default value is 5.

I think this means we could greatly improve the speed of the link  
checker by setting the latter to 1, and the former to... something  
reasonably high.

Definitely worth playing with.

-- 

olivier
Received on Monday, 27 June 2005 04:30:06 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Thursday, 19 August 2010 18:12:45 GMT