- From: CLOSE Dave <Dave.Close@us.thalesgroup.com>
- Date: Mon, 23 Jul 2007 18:24:42 -0700
- To: <www-validator@w3.org>
I'm trying to run checklink against an internal web site with several thousand pages. Every page includes a pair of links to outside URLs that cannot be accessed except through the company proxy. However, if the proxy is enabled, the internal pages cannot be accessed. So, when I run checklink with the proxy disabled, I get two errors for every page (and the process takes much longer than it should). I'm looking for a way to specify that some links should not be checked. Superficially, --exclude-docs would seem to do the job, but no; it would prevent checking the content of the subordinate page (if I could get to it), but does nothing to prevent checking links TO the page. Since the W3C validator seems to be the gold standard, and no alternatives are suggested by the top hundred or so responses to a Google search, I hope I've just missed a technique. Any suggestions? A second issue arises if I try to parse the output of the validator with all these extraneous errors. The errors themselves are reported on separate lines from the link and page which caused them. Using grep to find the errors doesn't reveal the source of the problems. Of course, I could write a program or script to handle this problem, but anything which saves context is likely to slow down scanning - and my output ran to more than 500 MB! I'd sure like to see output similar to Apache's log files with all related information on a single line. I can do pretty formatting myself. -- Dave Close, Thales Avionics, Irvine California USA Software engineering manager, Hardware engineering department cell +1 949 394 2124, dave.close@us.thalesgroup.com
Received on Tuesday, 24 July 2007 09:29:52 UTC