checklink: possible bug

I am using checklink.pl from the command line and I've found that 
whether it finds bad links can depend on the order the HTML files are 
listed as arguments.  (It may be indicative of a more significant 
shortcoming.)  Although I haven't looked at the code, I'm guessing the 
behavior has to do with the way checklink seems to avoid re-parsing 
files unnecessarily.

I've made a toy example to demonstrate the behavior.  Notice below that 
lines [3] and [4] give identical results, as I would expect.  However, 
lines [5] and [6] give different results, although they differ only in 
the order in which the HTML files are passed to checklink.

I hope this is worth your time,
Joel Schroeder

============BEGIN EXAMPLE============

[0]$ ls
anchor.htm  checklink.pl  link_name.htm  link_no_name.htm

[1]$ diff link_no_name.htm link_name.htm
5c5
<     <A href="anchor.htm"></A>
---
 >     <A href="anchor.htm#AAA"></A>

[2]$ cat anchor.htm
<HTML>
   <HEAD></HEAD>
   <BODY>

   <A name="AAA"></A>
   <IMG src="a.png">

   </BODY>
</HTML>

[3]$ ./checklink.pl link_no_name.htm anchor.htm | grep "Fix"
To do: The link is broken. Fix it NOW!

[4]$ ./checklink.pl anchor.htm link_no_name.htm | grep "Fix"
To do: The link is broken. Fix it NOW!

[5]$ ./checklink.pl link_name.htm anchor.htm | grep "Fix"

[6]$ ./checklink.pl anchor.htm link_name.htm | grep "Fix"
To do: The link is broken. Fix it NOW!

[7]$ cat link_no_name.htm
<HTML>
   <HEAD></HEAD>
   <BODY>

     <A href="anchor.htm"></A>

   </BODY>
</HTML>

[8]$ cat link_name.htm
<HTML>
   <HEAD></HEAD>
   <BODY>

     <A href="anchor.htm#AAA"></A>

   </BODY>
</HTML>

[9]$

============END EXAMPLE============

Received on Thursday, 20 February 2003 16:14:01 UTC