- From: Michael Ernst <mernst@csail.mit.edu>
- Date: Sun, 8 Feb 2004 14:05:08 -0500
- To: www-validator@w3.org
checklink.pl lacks the ability to ignore parts of a web hierarchy.
Ignoring everything under a certain URL can be desirable when it contains a
large or infinite number of pages. (As an example of the latter, consider
dynamically generated pages that link to other dynamically generated
pages.) Using the --depth argument is a partial workaround, but sometimes
I wish to check every link under a hierarchy, without seeing any reports
for a certain portion of it.
The below patch adds this functionality via an --omit option to checklink.pl.
-Michael Ernst
mernst@csail.mit.edu
cd ~/bin/share/
diff -u -b -r /g2/users/mernst/bin/share/checklink.pl-orig /g2/users/mernst/bin/share/checklink.pl
--- /g2/users/mernst/bin/share/checklink.pl-orig Fri Feb 6 11:54:10 2004
+++ /g2/users/mernst/bin/share/checklink.pl Sun Feb 8 08:57:59 2004
@@ -165,6 +165,7 @@
User => undef,
Password => undef,
Base_Location => '.',
+ Omit_Location => undef,
Masquerade => 0,
Masquerade_From => '',
Masquerade_To => '',
@@ -356,6 +357,7 @@
'r|recursive' => sub { $Opts{Depth} = -1
if $Opts{Depth} == 0; },
'l|location=s' => \$Opts{Base_Location},
+ 'o|omit=s' => \$Opts{Omit_Location},
'u|user=s' => \$Opts{User},
'p|password=s' => \$Opts{Password},
't|timeout=i' => \$Opts{Timeout},
@@ -414,6 +416,8 @@
By default, for example for
http://www.w3.org/TR/html4/Overview.html
it would be http://www.w3.org/TR/html4/
+ -o/--omit regexp Do not check pages whose url matches the perl
+ regexp.
-n/--noacclanguage Do not send an Accept-Language header.
-L/--languages Languages accepted$langs.
-q/--quiet No output if no errors are found. Implies -s.
@@ -792,6 +796,8 @@
return undef if ($current eq $rel); # Relative path not possible?
return undef if ($rel =~ m|^(\.\.)?/|); # Relative path starts with ../ or /?
+ return undef if (defined($Opts{Omit_Location})
+ && ($current =~ m/$Opts{Omit_Location}/));
return 1;
}
@@ -2165,6 +2171,11 @@
L<http://www.w3.org/TR/html4/Overview.html> for example, it would be
L<http://www.w3.org/TR/html4/>.
+=item B<-o, --omit regexp>
+
+Perl regexp for URLs of documents that should not be checked, even
+if they would otherwise be within scope.
+
=item B<-n, --noacclanguage>
Do not send an Accept-Language header.
Diff finished at Sun Feb 8 09:10:12
Received on Sunday, 8 February 2004 15:13:52 UTC