[checklink] Error in recursive checking from Luc Van Eycken on 2002-08-19 (www-validator@w3.org from August 2002)

From: Luc Van Eycken <Luc.VanEycken@esat.kuleuven.ac.be>
Date: Mon, 19 Aug 2002 11:52:15 -0400 (EDT)
To: www-validator@w3.org
Message-ID: <15713.5127.772098.785825@esat.kuleuven.ac.be>

The current checklink.pl forgets during recursive checking to check the
links of the pages that were previously searched for anchors only.

To reproduce the problem, create the following two files:

test.html
  <html><head><title>Test</title></head>
  <body><p>Referring to <a href="test1.html#A">item A</a>
  on <a href="test1.html">test1</a> page.</body></html>

test1.html
  <html><head><title>Test1</title></head>
  <body><p><a name="A">Item A</a>:
  <a href="NonExisting.html">invalid link</a>.</body></html>

Then "checklink -r http://.../test.html" and discover that the missing link
to the NonExisting.html document is not mentioned.

By the way, I am using checklink.pl,v 2.89.2.1 2002/07/07 21:54:55.

I can solve the problem with the hack given below, but I think that this
can not qualify as a proper patch.

Best regards,

Luc Van Eycken

--- checklink.pl.orig	2002-08-14 10:47:54.000000000 +0200
+++ checklink.pl	2002-08-19 17:38:55.000000000 +0200
@@ -846,7 +846,9 @@
 
     my $p;
 
-    if (defined($results{$uri}{parsing})) {
+    if (defined($results{$uri}{parsing})
+	&& (defined($results{$uri}{parsing}{Links})
+	    || !($links || $rec_needs_links))) {
         # We have already done the job. Woohoo!
         $p->{base} = $results{$uri}{parsing}{base};
         $p->{Anchors} = $results{$uri}{parsing}{Anchors};
@@ -879,6 +881,13 @@
 
     $p->parse($document);
 
+    # Make sure we know that links are searched for:
+    # create an empty Links hash
+    if (!$p->{only_anchors} && !defined($p->{Links})) {
+	$p->{Links} = {'1' => 2};
+	delete $p->{Links}{'1'};
+    }
+
     if (! $_summary) {
         my $stop = &get_timestamp();
         if ($_progress) {

Received on Monday, 19 August 2002 11:55:50 UTC