Re: checklink: base href not taken into account

14.9.2011 23:55, Charles Greathouse wrote:

> Checklink appears not to take a document's base URL into account.

I can confirm that there indeed is a bug here. Checklink resolves 
relative URLs using the page URL as the base, irrespective of the use of 
a <base href=...> element. Simple demo:
http://www.cs.tut.fi/~jkorpela/test/base.html

> I looked through the source and I couldn't find an attempt to handle
> this, so I guess this is a feature recommendation rather than a
> bugfix.  The code has a comment
>
> # base/@href intentionally not checked
>
> though this seems to refer to checking the link in <base>  rather than
> using base to get the document's base location.

The comment may relate to the discussion
http://lists.w3.org/Archives/Public/www-validator/2009Jan/0030.html
As it says "The link checker does compute the base URI properly, and 
reports all other tests (including tests related to base URI) properly", 
I suspect that the bug has crept in recently.

Regarding the checking of the URL in <base href=...>, I think it should 
be done unless there is a compelling technical reason against it. 
Formally, the URL there is not to be ever used as such (only as a base 
when resolving relative URLs). But formally, it is not an error to have, 
say, a link that does not refer to any resource but causes a 4xx or 5xx 
response (at least part of the time). Link checking is about practical 
issues, mostly not formal. And practically, it is useful to use a base 
URL that works by itself too - one reason to that is that the URL has a 
documentary value. (When I locally save a web page to study it in 
different modifications, I usually slap in a <base href> that refers to 
the original address, even though I know that anything past the last "/" 
is ignored.)

-- 
Yucca, http://www.cs.tut.fi/~jkorpela/

Received on Thursday, 15 September 2011 09:07:01 UTC