W3C home > Mailing lists > Public > www-validator@w3.org > May 2009

Re: checklink: error (or opportunity for improvement?) in masquerade option and/or checklink.pod

From: C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com>
Date: Fri, 1 May 2009 13:49:04 -0600
Cc: "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>, www-validator@w3.org
Message-Id: <497A67BB-5596-4291-9738-3A95EA1096BC@blackmesatech.com>
To: Ville Skyttä <ville.skytta@iki.fi>
On 28 Apr 2009, at 14:39 , Ville Skyttä wrote:

>> ...  (RFE: perhaps the handling of
>> the --masquerade option should check for a first argument which  
>> begins
>> with a literal quotation mark or a second argument which ends with
>> one?)
>
> Hmm... that's a pretty specific tweak, perhaps we should try  
> checking that
> both are well formed URI's instead?  And maybe the URI's should be
> canonicalized and transformed using real URI resolution instead of  
> simple
> "starts with" string matching?

Both of those would be fine by me; the quote checking is really very
ad hoc.

> Also, if you happen to have the previous buggy script or another  
> reproducer
> for the perl warnings you got, please send them my way (or this list  
> or the
> W3C Bugzilla) and I'll see what can be done about them.

I'll see if I can reconstruct them.

>> Now that I seem to have gotten --masquerade to work, I can suggest
>> wording for the man page which I think may be clearer for some
>> readers.  The current man page says this about masquerade:
>> --masquerade "local remote"
> [...]
>> I propose two alternative versions: (A), which retains the local /
>> remote example, and 9B), which doesn't.
>
> First, both of your alternatives reverse the order of the given URIs
> (previously kind of "to from", your suggestions have "from to").  Is  
> that
> intentional?  While perhaps more intuitive that way, it would be a  
> backwards
> compatibility issue.

They reverse the order of the names in the documentation, yes.
But that is because I believe the existing documentation has it
backwards.  That is, the text I proposed is an attempt to describe
what the software currently does.

There may be some linguistic issues involved here.  The man page
from which I was working, for example, uses 'masquerade' as a
transitive verb ("Masquerade local dir as a remote URI") -- but in
my idiolect of English, 'masquerade' is an intransitive verb.
X can masquerade as Y (pretend to be Y, impersonate Y) but I
cannot masquerade X as Y.

The code in the version of checklink I installed replaces the
first string in a pair with the second string in a pair, when
the URI in the document instance begins with the first string.
It does NOT replace the second string with the first string.

So if you wish (for example) to create locally (at /temp/dirx)
a directory which will eventually be installed at
http://www.example.org/dirx, and link check the documents before
actually uploading them, you will need to specify

   --masquerade "http://www.example.org/dirx /temp/dirx"

and NOT (as the other order would suggest)

   --masquerade "/temp/dirx http://www.example.org/dirx"

So yes, my saying "remote local" instead of "local remote" was
definitely intentional, but a documentation/implementation
compatibility fix, not a backwards compatibility issue.


>
>
> I've applied a slightly modified version of your (B) alternative in  
> CVS
> (reversed order of real-prefix and surrogate-prefix, see above)  
> because it's
> more accurate with the current link checker behavior.

I respectfully urge you to run the relatively simple test I
submitted earlier, to demonstrate that the order

   --masquerade "real surrogate"

is correct / matches the current link checker behavior, and

   --masquerade "surrogate real"

does not match current behavior.  (Or, perhaps, to demonstrate
yet again that I cannot create a test that actually tests what
it's supposed to, or cannot interpret the result reliably.)

I append a log showing the result of running the test on my system.
It seems to me to show that the order "real surrogate" is the one
that works, not vice versa.



> More changes will be
> needed if we change from simple "starts with" string replacement to  
> URI
> canonicalization/resolution (see beginning of my reply).  BTW  
> checklink --
> help's brief --masquerade entry already uses the base URI term.
>
> Thanks!

Thank you!

-- 
****************************************************************
* C. M. Sperberg-McQueen, Black Mesa Technologies LLC
* http://www.blackmesatech.com
* http://cmsmcq.com/mib
* http://balisage.net
****************************************************************


...................
console log, with ### comments added
...................

### setting the context.  We are in directory /tmp/checklink, which
### contains a 'testdoc.html' and no 'Activity.html'
###

[/tmp/checklink] $ pwd
/tmp/checklink
[/tmp/checklink] $ ls -l
total 8
-rw-r--r--  1 cmsmcq  admin  406 May  1 13:38 testdoc.html

###
### The test document links to 'Activity.html' and 'testdoc.html' in
### the remote URI 'directory' http://www.w3.org/XML/
### If checklink checks those links by looking at www.w3.org/XML,
### the link to Activity.html will succeed, and the link to
### testdoc.html will show up as broken.
### If checklink checks those links in /tmp/checklink, then
### Activity.html should be a broken link, and testdoc.html should
### be found and accepted as OK.
###

[/tmp/checklink] $ cat testdoc.html
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
     "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html>
<head><title>Test document</title></head>
<body>
<p>This document links to</p>
<ul>
<li><a href="http://www.w3.org/XML/Activity.html">the XML Activity  
page</a></li>
<li><a href="http://www.w3.org/XML/testdoc.html">this page, non- 
existent on the server</a></li>
</ul>
</body>
</html>

###
### Try the order --masquerade "remote local" or "real surrogate"
### Activity.html is not found (because checklink went to /tmp/checklink
### not w3.org, i.e. --masquerade worked.
###

[/tmp/checklink] $ checklink --summary --masquerade "http://www.w3.org/XML/ 
  file:///tmp/checklink/" testdoc.html

Processing	file:///tmp/checklink/testdoc.html

This may take some time if the document has many links to check.
processing http://www.w3.org/XML/testdoc.html in base file:///tmp/checklink/
processing http://www.w3.org/XML/Activity.html in base file:///tmp/checklink/


List of broken links and other issues:

file:///tmp/checklink/Activity.html	
   Line: 8
   Code: 404 File `/tmp/checklink/Activity.html' does not exist
  To do: The link is broken. Double-check that you have not made any  
typo,
	or mistake in copy-pasting. If the link points to a resource that
	no longer exists, you may want to remove or fix the link.
Anchors

Found 0 anchors.

###
### Now try the order --masquerade "local remote" or "surrogate real"
### testdoc.html is not found (because checklink went to w3.org to
### check the links, not to /tmp/checklink.  That is, the
### --masquerade option had no effect; we get the same result as if
### we had not used it at all.
###

[/tmp/checklink] $ checklink --summary --masquerade "file:///tmp/checklink/ 
  http://www.w3.org/XML/" testdoc.html

Processing	file:///tmp/checklink/testdoc.html

This may take some time if the document has many links to check.


List of broken links and other issues:

http://www.w3.org/XML/testdoc.html	
   Line: 9
   Code: 404 Not Found
  To do: The link is broken. Double-check that you have not made any  
typo,
	or mistake in copy-pasting. If the link points to a resource that
	no longer exists, you may want to remove or fix the link.
Anchors

Found 0 anchors.
[/tmp/checklink] $
Received on Friday, 1 May 2009 19:49:44 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 25 April 2012 12:14:35 GMT