- From: olivier Thereaux <ot@w3.org>
- Date: Fri, 11 Mar 2005 16:06:02 +0900
- To: QA Dev <public-qa-dev@w3.org>
- Cc: Ville Skyttä <ville.skytta@iki.fi>
Re: http://lists.w3.org/Archives/Public/public-qa-dev/2005Mar/0009.html Here is a list (re) visiting some of the options for improving the situation with the link checker's RobotUA implementation, and the timeouts that seems to cause with some UAs. I am including here a mixed bag of ideas, most of them bad, some of them outrageous, but with a bit of luck we can find one in the lot which isn't so terrible, or make a bad one better, or... Cat 1: make the link checker faster * RobotUA has a minimum latency of 1s between requests, so we can't make one W3C::UserAgent instance be much faster, but we could use several. I assume this is what the following comment means: [[ My $ua = W3C::UserAgent->new($AGENT); # @@@ TODO: admin address # @@@ make number of keep-alive connections customizable ]] Having a configurable number could make sense. We could also spawn one W3C::UserAgent per target host (would require changes in how and when the parsing of the links are done, I suppose?) Cat 2: "cheat" and pretend we are sending some output when we're not. (these are variants of our current hack spitting out spaces in summary mode) * I thought that this hack could be changed to output HTML comments nnw and then instead of a space every time a link is processed. Doesn't feel like a very good solution in any case. * "Summary only" could have the verbose output in a display:none. Would defeat the purpose in non-css-happy agents, though. * we could also admit defeat and remove the summary option altogether... Cat 3: "tell me when you're done" - use js to redirect to the results page when done - use server push (i.e, as far as I can remember, serve as MIME multipart with each multipart boundary triggering a refresh in many - but not all... - UAs) Two problems with the solutions above: 1- the basic mechanism won't work in all UAs, and 2- nothing tells us that the UA will not timeout and give up before whatever mechanism we use eventually send the "ready" signal. Cat 4: Change the model ... and accept that a real-time CGI needing a few minutes to complete its task is perhaps not appropriate. In this category come a bunch of asynchronous solutions where the user gives checklink a point of contact for sending results (by mail, or SOAP, or...) or where checklink gives the user a URI where the ckecking results will eventually be published (would have to expire these with 410 gone after a while of be confronted with disk space issues). one more idea in this category: supposing that we can give the link checker - a cache of currently processed queries - a buffer where the result table is being built Then checklink's output could be processing query, n links left to process, please reload in (estimated time) X seconds [ include current state of result table ] And then when the request is complete, include the full table. Needless to say I haven't found anything that satisfies me in the above yet... I feel dirty. :) -- olivier
Received on Friday, 11 March 2005 07:06:07 UTC