[checklink] Error in recursive checking

The current checklink.pl forgets during recursive checking to check the
links of the pages that were previously searched for anchors only.

To reproduce the problem, create the following two files:

test.html
  <html><head><title>Test</title></head>
  <body><p>Referring to <a href="test1.html#A">item A</a>
  on <a href="test1.html">test1</a> page.</body></html>

test1.html
  <html><head><title>Test1</title></head>
  <body><p><a name="A">Item A</a>:
  <a href="NonExisting.html">invalid link</a>.</body></html>

Then "checklink -r http://.../test.html" and discover that the missing link
to the NonExisting.html document is not mentioned.

By the way, I am using checklink.pl,v 2.89.2.1 2002/07/07 21:54:55.

I can solve the problem with the hack given below, but I think that this
can not qualify as a proper patch.

Best regards,

Luc Van Eycken

--- checklink.pl.orig	2002-08-14 10:47:54.000000000 +0200
+++ checklink.pl	2002-08-19 17:38:55.000000000 +0200
@@ -846,7 +846,9 @@
 
     my $p;
 
-    if (defined($results{$uri}{parsing})) {
+    if (defined($results{$uri}{parsing})
+	&& (defined($results{$uri}{parsing}{Links})
+	    || !($links || $rec_needs_links))) {
         # We have already done the job. Woohoo!
         $p->{base} = $results{$uri}{parsing}{base};
         $p->{Anchors} = $results{$uri}{parsing}{Anchors};
@@ -879,6 +881,13 @@
 
     $p->parse($document);
 
+    # Make sure we know that links are searched for:
+    # create an empty Links hash
+    if (!$p->{only_anchors} && !defined($p->{Links})) {
+	$p->{Links} = {'1' => 2};
+	delete $p->{Links}{'1'};
+    }
+
     if (! $_summary) {
         my $stop = &get_timestamp();
         if ($_progress) {

Received on Monday, 19 August 2002 11:55:50 UTC