W3C home > Mailing lists > Public > www-validator@w3.org > October 2011

Re: checklink: [SEC=UNOFFICIAL]

From: David Dorward <david@dorward.me.uk>
Date: Wed, 12 Oct 2011 14:56:09 +0100
Cc: "www-validator@w3.org" <www-validator@w3.org>
Message-Id: <4BE2FA54-27BF-4CB0-B596-ACCC2DB0A924@dorward.me.uk>
To: "Chia, Dave" <Dave.Chia@dbcde.gov.au>
On 12 Oct 2011, at 01:56, Chia, Dave wrote:
> The link checker tends to use and check search functions, and comment functions when they are available on a website.

"Use"? Has the link checker acquired the ability to fill out forms while I wasn't looking? Or do you just mean "follow links that happen to go to search result pages".

> This adds to a great deal of unnecessary checks on irrelevant pages. Shouldn’t the link checker identify the ‘real’ pages and just check the links on those pages?


Determining what links are "relevant" is a very difficult problem. First you would have to decide what consisted a relevant link (opinions WILL differ), then come up with some kind of heuristic  algorithm to determine which links went somewhere relevant and which did not.

The program does have the --exclude-docs switch, which lets you specify a regular expression that matches URLs you don't want to check, so authors testing their sites can exclude comment and search pages so long as they have a semi-sane URI structure.

-- 
David Dorward
http://dorward.me.uk
Received on Wednesday, 12 October 2011 13:56:40 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 25 April 2012 12:14:49 GMT