Re: checklink: recursively only on own site

On Thu, 2003-11-20 at 22:15, Dr. Georg Czedik-Eysenberg wrote:
> >> But the option "Check linked documents recursively"
> >> should provide the possibility, only to check links in
> >> documents on the same site as the originally entered document.
> 
> > This is how the link checker should behave in CGI mode.  When run from
> > the command line, one can use the --location option to configure the
> > recursion scope.  Both ways behave as expected here.
> 
> I am not sure, what you mean with "CGI mode" or "from the command line".
> I mean the "online version" at http://validator.w3.org/checklink.

Sorry for being unclear.  "CGI mode" corresponds to the "online
version"; checklink can be also downloaded and run from the command line
locally.

> > Do you have an URL where unexpected results can be witnessed?
> 
> Yes:
> 
> http://validator.w3.org/checklink?uri=http%3A%2F%2Fgeorg.czedik.net%2Fungarn.htm&summary=on&hide_redirects=on&hide_type=all&recursive=on&depth=1&cookie=nochanges&check=Check
> 
> does not only check the links in my documents http://georg.czedik.net/...
> but also in http://www.info-serve.de/...

Oh, indeed.  Thanks for the sample URI!  I believe I found out what was
causing this, it's links like these:

  http://georg.czedik.net/cgi-bin/link-to.sh?http://www.info-serve2.de/[...]

Now, our recursion scope was http://georg.czednik.net/, and the above
link is in that scope.  However, checklink did not check the scope again
when the URI was redirected to http://www.info-serve2.de/[...], causing
it to think that we're safely within the scope, and continuing the
recursion "off-site".

This should be now taken care of in the CVS revision 3.6.2.23 of
checklink [1], the scope is now checked both before and after ("after"
only if "before" was in scope), because I believe that's what most
people expect.

Before there is a public online version of the "fixed" checklink,
removing the link-to.sh CGI script and linking directly to the target
resources would be the only "workaround" I can come up with.

[1] http://dev.w3.org/cvsweb/validator/httpd/cgi-bin/checklink.pl

Received on Saturday, 22 November 2003 10:59:26 UTC