W3C home > Mailing lists > Public > www-validator@w3.org > September 2008

Re: checklink: infinite loop

From: Michael Ernst <mernst@alum.mit.edu>
Date: Tue, 30 Sep 2008 09:24:18 +0200
Message-ID: <18657.54306.428306.727234@swsmde.ds.mpi-sws.mpg.de>
To: www-validator@w3.org
> > Checklink suffers an infinite loop when run with the -r switch.

> I could not 
> find anything in any URI specifications that would say multiple successive 
> slashes in the path part could be treated as one (or put another way, empty 
> path segments could be discarded), so doing that could break some (strange 
> though) cases where they actually mean something else than the same URL with 
> only one slash there as usual.

You're right that the URI spec doesn't require that, so it could be bad in
strange situations.  However, as you also pointed out, every browser that
you tested behaves that way -- at least for the special case of relative
links.

So, your proposal:

> only the slashes 
> immediately following the . or .. in relative links, not 2+ slashes further 
> down the relative URL, should be truncated to one

is OK:  less potential for good, less potential for harm, but it fixes the
two specific cases I have encountered in practice.

I've revised the patch, below.

                    -Mike

PS:  In the future, could you copy me on any response?  I don't subscribe
to this mailing list and it's only so often that I browse to the archives
to look for a response.  Thanks!


Index: checklink
===================================================================
RCS file: /sources/public/perl/modules/W3C/LinkChecker/bin/checklink,v
retrieving revision 4.116
diff -u -u -b -r4.116 checklink
--- checklink	22 Sep 2008 19:33:31 -0000	4.116
+++ checklink	30 Sep 2008 07:16:38 -0000
@@ -1552,6 +1595,9 @@
 {
   my ($self, $uri, $base, $line) = @_;
   if (defined($uri)) {
+    # Remove repeated slashes after the . or .. in relative links, to avoid
+    # duplicated checking or infinite recursion.
+    $uri =~ s|^(\.\.?/)/+|$1|o;
     $uri = URI->new_abs($uri, $base) if defined($base);
     $self->{Links}{$uri}{$line}++;
   }
Received on Tuesday, 30 September 2008 07:25:01 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 25 April 2012 12:14:32 GMT