W3C home > Mailing lists > Public > www-validator@w3.org > October 2011

RE: checklink: [SEC=UNOFFICIAL]

From: Chia, Dave <Dave.Chia@dbcde.gov.au>
Date: Thu, 13 Oct 2011 08:43:55 +1100
To: "David Dorward" <david@dorward.me.uk>
cc: "www-validator@w3.org" <www-validator@w3.org>
Message-ID: <41A048A5CD22D146893047779A6A0F663D6927D677@EMB01.dept.gov.au>
Hi David,

The system seems to follow links that come goes to search results pages (see staysmartonline link checker.pdf), as well as try to comment on an article (see digitalbusiness.pdf).

Shouldn't it just check the pages which are linked to the website with the proper address instead of trying to follow search results and comments? I thought that would help determine which pages are relevant.

Best regards,
Dave

From: David Dorward [mailto:dorward@gmail.com] On Behalf Of David Dorward
Sent: Thursday, 13 October 2011 12:56 AM
To: Chia, Dave
Cc: www-validator@w3.org
Subject: Re: checklink: [SEC=UNOFFICIAL]

On 12 Oct 2011, at 01:56, Chia, Dave wrote:
The link checker tends to use and check search functions, and comment functions when they are available on a website.

"Use"? Has the link checker acquired the ability to fill out forms while I wasn't looking? Or do you just mean "follow links that happen to go to search result pages".


This adds to a great deal of unnecessary checks on irrelevant pages. Shouldn't the link checker identify the 'real' pages and just check the links on those pages?

Determining what links are "relevant" is a very difficult problem. First you would have to decide what consisted a relevant link (opinions WILL differ), then come up with some kind of heuristic  algorithm to determine which links went somewhere relevant and which did not.

The program does have the --exclude-docs switch, which lets you specify a regular expression that matches URLs you don't want to check, so authors testing their sites can exclude comment and search pages so long as they have a semi-sane URI structure.

--
David Dorward
http://dorward.me.uk



-------------------------------------------------------------------------------

NOTICE: This email message is for the sole use of the intended recipient(s) 
 and may contain confidential and privileged information. Any unauthorized 
 review, use, disclosure or distribution is prohibited. If you are not the 
 intended recipient, please contact the sender by reply email and destroy all 
 copies of the original message. 

This message has been content scanned by the Axway MailGate. 
MailGate uses policy enforcement to scan for known viruses, spam, undesirable content and malicious code. For more information on Axway products please visit www.axway.com.


-------------------------------------------------------------------------------





Received on Wednesday, 12 October 2011 21:59:41 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 25 April 2012 12:14:49 GMT