checklink: error (or opportunity for improvement?) in masquerade option and/or checklink.pod

It appears that either the --masquerade option is not working, or
the documentation could usefully be revised to make clearer how
to use it.

The summary of options at
http://search.cpan.org/dist/W3C-LinkChecker/bin/checklink.pod
says:

--masquerade "local remote"

    Masquerade local dir as a remote URI. For example, the
    following results in /my/local/dir/ being "mapped" to
    http://some/remote/uri/

      --masquerade "/my/local/dir http://some/remote/uri/"

I understand this to mean that if the document being checked
contained a link to (for example)

   http://some/remote/uri/foo.html

then checklink would not attempt to communicate with the remote
server, but would check the local filesystem for a file called

   /my/local/dir/foo.html

This would make it convenient to prepare a set of interlinked
documents locally, link check them, and correct the errors before
uploading them to a public server.  So far so good.  That is what
I am trying to do.  (If this is not what masquerade is intending
to do, it suggests an opportunity for improving the man page --
I'll happily suggest wording, if I can ever understand what
masquerade does and how it works.)

But using (the equivalent of)

  checklink --masquerade ". http://example.org/x/y/"
     --masquerade "../z http://example.org/x/z/"
     doc.html

did not produce the expected results: checklink complained about
things being missing from example.org/x/y even though they were
present in the current directory. It complained, for example,
about a link to http://example.org/x/y/doc.html being a bad link,
though doc.html is definitely present in the local directory
masquerading as http://example.org/x/y/ -- it's the document
being checked.

I concluded that I had misread the documentation, or that there
were unexpected constraints on the syntax of the paired
arguments.  I tried the arguments in various forms; I tried them
local-first and remote-first.

I made a test file (attached) named testdoc.html, which has links
to http://www.w3.org/XML/Activity.html and to
http://www.w3.org/XML/testdoc.html, which does not exist.  In the
directory containing testdoc.html, there is no Activity.html.

When I run

   checklink --quiet testdoc.html

I am told, as expected, that http://www.w3.org/XML/testdoc.html
produces a 404.

When I run

   checklink --quiet --masquerade ". http://www.w3.org/XML/"  
testdoc.html

I get the same result.  I have run this test case with the local
argument in the forms

   "."
   "./"
   "/Users/cmsmcq/2009/misc"
   "/Users/cmsmcq/2009/misc/"
   "file:///Users/cmsmcq/2009/misc"
   "file:///Users/cmsmcq/2009/misc/"

and the remote argument in the forms

   "http://www.w3.org/XML"
   "http://www.w3.org/XML/"

with the arguments in the order remote - local and local -
remote.  All 24 permutations produce the same result, which
suggests that in no case am I succeeding in making masquerading
do anything at all.

Are my expectations inconsistent with the intent?  Or is the code
broken?

One further note: when my bash command was insufficiently
escaped, some variants did elicit a complaint about

Use of uninitialized value in pattern match (m//) at /usr/local/bin/ 
checklink line 201.
Use of uninitialized value in string eq at /System/Library/Perl/Extras/ 
5.8.8/WWW/RobotRules.pm line 152.
Use of uninitialized value in string eq at /System/Library/Perl/Extras/ 
5.8.8/WWW/RobotRules.pm line 152.
Use of uninitialized value in pattern match (m//) at /usr/local/bin/ 
checklink line 201.
Use of uninitialized value in string eq at /System/Library/Perl/Extras/ 
5.8.8/WWW/RobotRules.pm line 152.
Use of uninitialized value in string eq at /System/Library/Perl/Extras/ 
5.8.8/WWW/RobotRules.pm line 152.

which suggests a problem on some other path through the code.



-- 
****************************************************************
* C. M. Sperberg-McQueen, Black Mesa Technologies LLC
* http://www.blackmesatech.com
* http://cmsmcq.com/mib
* http://balisage.net
****************************************************************

Received on Thursday, 23 April 2009 23:11:43 UTC