- From: Ville Skyttä <ville.skytta@iki.fi>
- Date: Fri, 24 Jun 2005 15:04:40 +0300
- To: QA-dev <public-qa-dev@w3.org>
On Wed, 2005-06-22 at 15:56 +0200, Dominique Hazaël-Massieux wrote: > Hi QA-dev, Ville, Hi, > I've had a quick look at the linkchecker to see what would be needed to > make it multithreaded; I see the linkchecker is using LWP::RobotUA. Has > any thought being put to use LWP::Parallel::RobotUA [1] instead? Some thoughts about parallelizing the fetches have been tossed around on this list and the meetings in the past, but AFAIK nothing concrete has really happened yet. My feeling is that it has been seen as a good thing in general. > It's a > derivative of LWP::Parallel:UserAgent [2] which should allow to do a > per-host thread using register/callback functions. There are several > examples documented [3], esp. one using the RobotUA class. I'm vaguely aware of this, and should really get more familiar with it sometime soon. Haven't found round tuits to do it so far, though. > I haven't investigated much how this would apply to the linkchecker; I > guess my first question is whether this has already been evaluated but > discarded or not. (I haven't found anything with the mailing list search > engine, but it may have been discussed in other fora) As said, I think it has been discussed here or the meetings somewhat, but I don't have any pointers to throw in right now. Anyway, the idea has certainly not been discarded. Personally, my only concern about parallelizing the link checker is that it might cause some complications or restrictions on how to present the results to the user. I think we all agree that the results UI needs some work anyway, and it isn't quite clear what would be The Way to implement it; the most serious problem currently being the timeout issues (either server or client side). Assuming we keep the results output relatively close to what it is now: we need to synchronize stuff on the output stream to the client either on the callback level (OTOH callbacks shouldn't really print anything to the stream IMO), or to implement an event sink of some kind that takes care of the output stream while the checking proceeds, or to buffer the results more than now and present them in bigger chunks (which could make the timeout problems even worse than now). These are more random than refined thoughts, and it might well turn out to be that the concerns go away as someone just starts to experiment. But currently I tend to think that before starting serious work on the parallelization, we should first have a decision how we would like to present the results to the user in the future.
Received on Friday, 24 June 2005 12:04:47 UTC