Re: Link Checker Not Following Redirect (or some other problem?) from Ville Skyttä on 2002-12-06 (www-validator@w3.org from December 2002)

From: Ville Skyttä <ville.skytta@iki.fi>
Date: 07 Dec 2002 00:52:47 +0200
To: Joseph Reagle <reagle@w3.org>
Cc: www-validator@w3.org
Message-Id: <1039215167.27607.201.camel@localhost.localdomain>

On Fri, 2002-12-06 at 18:45, Joseph Reagle wrote:

> > Yes, when invoked from the web, checklink limits the recursion scope so
> > that when recursively checking <http://foo.bar.org/quux/something>, only
> > documents below <http://foo.bar.org/quux/> are checked.  When invoked
> > from command line, the -l option can be used for specifying the scope.
> 
> Is there an easy way to have it now recurse when its the same resource when 
> a index.html/Overview.html is used by the Web server?

That's not relevant.  The "base" of the URI is; meaning that checking
<http://foo.bar.org/quux/something> limits the scope such that *only*
URIs that begin with <http://foo.bar.org/quux/> will be checked.  There
have been reports about checklink wandering even to other sites, though,
so bugs may be lurking here.

And regarding your test doc at
<http://www.w3.org/Encryption/2001/Drafts/xmlenc-core/Overview.html>,
the only URI in that document which is in the recursion scope is
<http://www.w3.org/Encryption/2001/Drafts/xmlenc-core/> (which happens
to be the same resource), hence there's little recursion :)

> Does checklink ask for text/html and application/xhtml+xml docs?

Currently, no Accept headers are sent at all.  This will be changed in
CVS soon.

> I content 
> negotiate on the namespace redirect [1] so if you ask for html, that's what 
> you get, if you ask for application/xml you'll get the schema and such. The 
> default is the html document though, so if you're not sending any accept 
> header, maybe I have a bug?

This logic seems to work as such.  But note that the rewrite rules snip
the fragment from the redirect URIs [1], [2].  I don't know if this is
ok.  And if the fragments were in the redirect URIs, would
<http://www.w3.org/TR/2002/REC-xmldsig-core-20020212/xmldsig-core-schema.xsd#sha1>
make any sense?

To be safe, I would probably go for proxying instead of redirecting, if
possible.

[1]
HEAD /2000/09/xmldsig#sha1 HTTP/1.0
Accept: text/html
--> Location: http://www.w3.org/TR/2002/REC-xmldsig-core-20020212/Overview.html

[2]
HEAD /2000/09/xmldsig#sha1 HTTP/1.0
Accept: text/xml
--> Location: http://www.w3.org/TR/2002/REC-xmldsig-core-20020212/xmldsig-core-schema.xsd

-- 
\/ille Skyttä
ville.skytta at iki.fi

Received on Friday, 6 December 2002 17:51:39 UTC