Re: Status of parallel link checker?

On Mon, 2006-04-03 at 14:55 +0900, olivier Thereaux wrote:
> On 3 Apr 2006, at 04:13, Ville Skyttä wrote:

> > but also because it does
> > not actually sleep between requests to a host but does something weird
> > instead
> 
> ouch, you mean ParallelUserAgent does that? or is it something that  
> the current linkchecker code does wrong in this regard?

The former, around line 286 in LWP::Parallel::RobotUA:

    if ($self->{'use_sleep'}) {
      # well, we don't really use sleep, but lets emulate
      # the standard LWP behavior as closely as possible...

It does manage to wait between requests some other way though.  But
quickly observing the CPU usage it looks like a busy loop somewhere.

> If a browser-based widget (either ajax or proprietary browser plugin)  
> were to do link checking today, I don't really expect that there  
> would be protests to get them to follow robots.txt. Avoid slamming  
> remote servers, probably, but respect Disallow: etc., probably not.
> The more I think of this, the more I look at your "ack" [1] with  
> interest, and think it could/should be the replacement for the web- 
> based link checker (while still distributing the older as perl module/ 
> command-line tool).

I've made some tiny local improvements to that hack in the meantime,
will put a new version online to qa-dev soon.

Received on Monday, 3 April 2006 18:42:09 UTC