- From: C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com>
- Date: Thu, 23 Apr 2009 17:10:58 -0600
- To: www-validator@w3.org
- Cc: "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>
It appears that either the --masquerade option is not working, or
the documentation could usefully be revised to make clearer how
to use it.
The summary of options at
http://search.cpan.org/dist/W3C-LinkChecker/bin/checklink.pod
says:
--masquerade "local remote"
Masquerade local dir as a remote URI. For example, the
following results in /my/local/dir/ being "mapped" to
http://some/remote/uri/
--masquerade "/my/local/dir http://some/remote/uri/"
I understand this to mean that if the document being checked
contained a link to (for example)
http://some/remote/uri/foo.html
then checklink would not attempt to communicate with the remote
server, but would check the local filesystem for a file called
/my/local/dir/foo.html
This would make it convenient to prepare a set of interlinked
documents locally, link check them, and correct the errors before
uploading them to a public server. So far so good. That is what
I am trying to do. (If this is not what masquerade is intending
to do, it suggests an opportunity for improving the man page --
I'll happily suggest wording, if I can ever understand what
masquerade does and how it works.)
But using (the equivalent of)
checklink --masquerade ". http://example.org/x/y/"
--masquerade "../z http://example.org/x/z/"
doc.html
did not produce the expected results: checklink complained about
things being missing from example.org/x/y even though they were
present in the current directory. It complained, for example,
about a link to http://example.org/x/y/doc.html being a bad link,
though doc.html is definitely present in the local directory
masquerading as http://example.org/x/y/ -- it's the document
being checked.
I concluded that I had misread the documentation, or that there
were unexpected constraints on the syntax of the paired
arguments. I tried the arguments in various forms; I tried them
local-first and remote-first.
I made a test file (attached) named testdoc.html, which has links
to http://www.w3.org/XML/Activity.html and to
http://www.w3.org/XML/testdoc.html, which does not exist. In the
directory containing testdoc.html, there is no Activity.html.
When I run
checklink --quiet testdoc.html
I am told, as expected, that http://www.w3.org/XML/testdoc.html
produces a 404.
When I run
checklink --quiet --masquerade ". http://www.w3.org/XML/"
testdoc.html
I get the same result. I have run this test case with the local
argument in the forms
"."
"./"
"/Users/cmsmcq/2009/misc"
"/Users/cmsmcq/2009/misc/"
"file:///Users/cmsmcq/2009/misc"
"file:///Users/cmsmcq/2009/misc/"
and the remote argument in the forms
"http://www.w3.org/XML"
"http://www.w3.org/XML/"
with the arguments in the order remote - local and local -
remote. All 24 permutations produce the same result, which
suggests that in no case am I succeeding in making masquerading
do anything at all.
Are my expectations inconsistent with the intent? Or is the code
broken?
One further note: when my bash command was insufficiently
escaped, some variants did elicit a complaint about
Use of uninitialized value in pattern match (m//) at /usr/local/bin/
checklink line 201.
Use of uninitialized value in string eq at /System/Library/Perl/Extras/
5.8.8/WWW/RobotRules.pm line 152.
Use of uninitialized value in string eq at /System/Library/Perl/Extras/
5.8.8/WWW/RobotRules.pm line 152.
Use of uninitialized value in pattern match (m//) at /usr/local/bin/
checklink line 201.
Use of uninitialized value in string eq at /System/Library/Perl/Extras/
5.8.8/WWW/RobotRules.pm line 152.
Use of uninitialized value in string eq at /System/Library/Perl/Extras/
5.8.8/WWW/RobotRules.pm line 152.
which suggests a problem on some other path through the code.
--
****************************************************************
* C. M. Sperberg-McQueen, Black Mesa Technologies LLC
* http://www.blackmesatech.com
* http://cmsmcq.com/mib
* http://balisage.net
****************************************************************
Received on Thursday, 23 April 2009 23:11:43 UTC