W3C home > Mailing lists > Public > www-validator@w3.org > September 2012

Re: checklink: false positives for broken links

From: Jukka K. Korpela <jkorpela@cs.tut.fi>
Date: Wed, 05 Sep 2012 08:28:15 +0300
Message-ID: <5046E2EF.8020108@cs.tut.fi>
To: Star Ostgard <ostarella@gmail.com>
CC: www-validator@w3.org
2012-09-02 22:30, Star Ostgard wrote:

> Did linkchecker for http://shadowwalker.info/archives_a.html and it
> listed these four supposedly broken links:
> Line: 272 http://happyholly.co.uk/FanFiction/2011/10/30/a-matter-of-opinion/
>     *Status*: 404 Not Found

Something odd is happening here. When I try to run the link checker 
directly on the happyholly.co.uk page, I get an analysis with 404 Not 
Found, too. Yet, using most browsers, that page is available, and using, 
on Firefox, Tamper Data to remove all HTTP headers except Host, I still 
get to the page (so this can't be the effect of some User-Agent sniffing 
or anything like that).

But using Lynx 2.8.7rel.1, I get this:

% lynx -mime_header 
HTTP/1.1 404 Not Found
X-Powered-By: PHP/5.3.16
X-Pingback: http://happyholly.co.uk/xmlrpc.php
Expires: Wed, 11 Jan 1984 05:00:00 GMT
Cache-Control: no-cache, must-revalidate, max-age=0
Pragma: no-cache
Set-Cookie: PHPSESSID=701fcd2ad965a64219f9fde57d29be2a; path=/
Last-Modified: Wed, 05 Sep 2012 05:22:08 GMT
Content-Type: text/html; charset=UTF-8
Server:  - Web acceleration by http://www.unixy.net/varnish
X-Cacheable: YES
Content-Length: 6160
Accept-Ranges: bytes
Date: Wed, 05 Sep 2012 05:22:08 GMT
X-Varnish: 1570843335
Via: 1.1 varnish
Connection: close
age: 0
X-Cache: MISS

followed by an HTML document containing the explanation "Sorry, no posts 
matched your criteria". And on Lynx, accessing the page 
http://shadowwalker.info/archives_a.html the link corresponding to that 
URL causes a 404 error too.

So I'm puzzled. What is that Lynx and the Link Checker do the same way, 
and differently from common browsers, that causes this. I can see that 
the X-Pingback address is different from what I get otherwise, namely
but I don't see how that would relate.

Received on Wednesday, 5 September 2012 05:28:45 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 1 March 2016 14:18:06 UTC