- From: Michael Ernst <mernst@alum.mit.edu>
- Date: Tue, 30 Sep 2008 09:24:18 +0200
- To: www-validator@w3.org
- Message-ID: <18657.54306.428306.727234@swsmde.ds.mpi-sws.mpg.de>
> > Checklink suffers an infinite loop when run with the -r switch. > I could not > find anything in any URI specifications that would say multiple successive > slashes in the path part could be treated as one (or put another way, empty > path segments could be discarded), so doing that could break some (strange > though) cases where they actually mean something else than the same URL with > only one slash there as usual. You're right that the URI spec doesn't require that, so it could be bad in strange situations. However, as you also pointed out, every browser that you tested behaves that way -- at least for the special case of relative links. So, your proposal: > only the slashes > immediately following the . or .. in relative links, not 2+ slashes further > down the relative URL, should be truncated to one is OK: less potential for good, less potential for harm, but it fixes the two specific cases I have encountered in practice. I've revised the patch, below. -Mike PS: In the future, could you copy me on any response? I don't subscribe to this mailing list and it's only so often that I browse to the archives to look for a response. Thanks!
Index: checklink =================================================================== RCS file: /sources/public/perl/modules/W3C/LinkChecker/bin/checklink,v retrieving revision 4.116 diff -u -u -b -r4.116 checklink --- checklink 22 Sep 2008 19:33:31 -0000 4.116 +++ checklink 30 Sep 2008 07:16:38 -0000 @@ -1552,6 +1595,9 @@ { my ($self, $uri, $base, $line) = @_; if (defined($uri)) { + # Remove repeated slashes after the . or .. in relative links, to avoid + # duplicated checking or infinite recursion. + $uri =~ s|^(\.\.?/)/+|$1|o; $uri = URI->new_abs($uri, $base) if defined($base); $self->{Links}{$uri}{$line}++; }
Received on Tuesday, 30 September 2008 07:25:01 UTC