- From: Dan Jacobson <jidanni@jidanni.org>
- Date: Mon, 22 Dec 2003 08:29:49 +0800
- To: www-validator@w3.org
[Note at the bottom it appears parallel fetching will satisfy me.] What I want to do with checklink: find . -name \*.html| xargs checklink --just-print-the-urls-that-need-to-be-checked>url-list pppd isp #56Kbs < url-list ssh my.account.on.networked.machine \ checklink --just-check-that-these-urls-exist or something like that. You see, with its much greater connectivity, my.account.on.networked.machine could produce the results in moments, or I could nohup it and get the results next call. You see, running find . -name \*.html|xargs checklink wastes costly modem time, and doing ssh my.account.on.networked.machine \ nohup checklink --recursive-or-whatever http://jidanni.org/ would eat unnecessarily into my precious bandwidth allotment at the website host company, when indeed all my pages are right here on my PC offline. all in all, I'm saying there should be a way to allow separation of link gathering and link checking. Wait, all I need to do is perhaps: ssh my.account.on.networked.machine <<\! sed 's/.*/<a href="&">x<\/a>/'<<\EOF |checklink url1 url2... Also I must first extract all the urls from my pages... Hold on, if apt-get can have a --print-uris, why can't checklink have a --just-print-urls-we-would-have-checked a/k/a --print-uris? Maybe such an uri list could also have, commented out, the pages in which they were found: #nurd.html http://turd.oo/ http://turd.oo/blaa #verd.html http://hey.vern/ernest Wait, let me try parallel fetching, indeed finished faster: set -e unset ftp_proxy http_proxy p=cl$$- w=~/jidanni.org/ #local directory cd /var/tmp #avoid nohup.out droppings find $w -name '*.html'|split -l 6 - $p for i in $p*[abc]? do # < $i xargs checklink -n >$i.out& nohup sh -c "xargs checklink -n <$i >$i.out&" #nohup for emacs' compile mode :-( done
Received on Monday, 22 December 2003 00:41:07 UTC