W3C home > Mailing lists > Public > public-qa-dev@w3.org > April 2006

Re: Status of parallel link checker?

From: Ville Skyttä <ville.skytta@iki.fi>
Date: Mon, 03 Apr 2006 21:42:05 +0300
To: QA-dev <public-qa-dev@w3.org>
Message-Id: <1144089725.4635.26.camel@localhost.localdomain>

On Mon, 2006-04-03 at 14:55 +0900, olivier Thereaux wrote:
> On 3 Apr 2006, at 04:13, Ville Skyttä wrote:

> > but also because it does
> > not actually sleep between requests to a host but does something weird
> > instead
> 
> ouch, you mean ParallelUserAgent does that? or is it something that  
> the current linkchecker code does wrong in this regard?

The former, around line 286 in LWP::Parallel::RobotUA:

    if ($self->{'use_sleep'}) {
      # well, we don't really use sleep, but lets emulate
      # the standard LWP behavior as closely as possible...

It does manage to wait between requests some other way though.  But
quickly observing the CPU usage it looks like a busy loop somewhere.

> If a browser-based widget (either ajax or proprietary browser plugin)  
> were to do link checking today, I don't really expect that there  
> would be protests to get them to follow robots.txt. Avoid slamming  
> remote servers, probably, but respect Disallow: etc., probably not.
> The more I think of this, the more I look at your "ack" [1] with  
> interest, and think it could/should be the replacement for the web- 
> based link checker (while still distributing the older as perl module/ 
> command-line tool).

I've made some tiny local improvements to that hack in the meantime,
will put a new version online to qa-dev soon.
Received on Monday, 3 April 2006 18:42:09 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Thursday, 19 August 2010 18:12:46 GMT