- From: olivier Thereaux <ot@w3.org>
- Date: Mon, 3 Apr 2006 14:55:41 +0900
- To: QA Dev <public-qa-dev@w3.org>
Ville, On 3 Apr 2006, at 04:13, Ville Skyttä wrote: > I took (finally) a brief look at CVS HEAD of link checker, and it is > probably expectedly pretty broken at the moment. I haven't made much progress since my comment on version 4.22 of the checklink script, that said "WARNING: this code is rather broken..." so I suspect yes, things are still pretty broken. I recall they did work faster, and I think it performed the basic tasks, but there were issues. I didn't pursue the experiment further, for reasons detailed below. > I'm having some concerns about ParallelUserAgent not only because > of the > missing included request inside responses [0] Yes, I ran into that a couple of times, hence v4.24 of http://dev.w3.org/cvsweb/perl/modules/W3C/LinkChecker/bin/checklink Dom submitted one patch to Marc, and he said he'd give it a look, but he also kept his promise that "it wouldn't be quick" :/ > but also because it does > not actually sleep between requests to a host but does something weird > instead ouch, you mean ParallelUserAgent does that? or is it something that the current linkchecker code does wrong in this regard? > So, what's the general status of the parallel link checking stuff, > is someone subconsciously or otherwise working on it? Not working on it at the moment... Basically, I often find myself regretting having accepted the request to have the link checker follow robots.txt rules. Not only did it make the tool awfully slow (I only ever use it from the commandline anymore, using it from a browser pains me), in many cases it also means that links will have to be checked by hand. Hence my relative cold feet about ParallelRobotUA, or any RobotUA solution in general. If a browser-based widget (either ajax or proprietary browser plugin) were to do link checking today, I don't really expect that there would be protests to get them to follow robots.txt. Avoid slamming remote servers, probably, but respect Disallow: etc., probably not. The more I think of this, the more I look at your "ack" [1] with interest, and think it could/should be the replacement for the web- based link checker (while still distributing the older as perl module/ command-line tool). -- olivier
Received on Monday, 3 April 2006 05:55:52 UTC