W3C home > Mailing lists > Public > www-validator@w3.org > September 2008

Re: checklink: infinite loop

From: Ville Skyttä <ville.skytta@iki.fi>
Date: Fri, 26 Sep 2008 21:42:51 +0300
To: www-validator@w3.org
Message-Id: <200809262142.51475.ville.skytta@iki.fi>

On Thursday 25 September 2008, Michael Ernst wrote:
> Checklink suffers an infinite loop when run with the -r switch.

Ouch, thanks for the report.

>   <a href="..//package-tree.html"><b>PREV</b></a>
>
> Such a link works fine in my browser (Firefox), and checklink shouldn't
> infinite loop.

Hm, but that's somewhat different than your testcase.  The above link starts 
with "..//" (two dots) while the subdir link in your testcase starts 
with ".//" (one dot).  All the browsers I've tested (Firefox, Opera, 
Konqueror on Linux) behave the same as checklink - every click of the "This 
very file" link in subdir/index.html in your testcase results in one slash 
added between subdir and index.html.

> I have attached a patch that corrects the problem.

While the problem is very real, I don't think the fix is correct.  I could not 
find anything in any URI specifications that would say multiple successive 
slashes in the path part could be treated as one (or put another way, empty 
path segments could be discarded), so doing that could break some (strange 
though) cases where they actually mean something else than the same URL with 
only one slash there as usual.

I think a better fix would be to keep an eye only on relative links that start 
with . or .. followed by two or more slashes.  For each such link found, 
truncate the successive slashes to one before handling it as usual, and flag 
the link with a descriptive warning what was done, and why (to prevent 
possible infinite loops in recursive retrieval tools such as link checker, 
maybe someone can suggest some other reasons too?).

I don't think successive 2+ slashes in other cases than when followed by a 
relative link starting with . or .. can cause this problem, therefore I think 
it'd be better to leave them alone (and for the same reason, only the slashes 
immediately following the . or .. in relative links, not 2+ slashes further 
down the relative URL, should be truncated to one).  Did I miss something?
Received on Friday, 26 September 2008 18:43:28 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 25 April 2012 12:14:32 GMT