W3C home > Mailing lists > Public > www-validator@w3.org > August 2011

Re: Web SUBpages rejected with "Bad hostname"

From: Jukka K. Korpela <jkorpela@cs.tut.fi>
Date: Tue, 16 Aug 2011 22:42:41 +0300
Message-ID: <4E4AC831.4000107@cs.tut.fi>
To: www-validator@w3.org, erlkonig@talisman.org
16.8.2011 16:40, I (Jukka K. Korpela) wrote:

> 16.8.2011 12:08, C. Alex. North-Keys wrote:
[...]
>>> 1. I got the following unexpected response when trying to retrieve
>>> <http://www.talisman.org/~erlkonig/img/>:
>>> 500 Can't connect to dont-waste-bandwidth-running-validator-here:80
> [...]
>> Of course, the validator was perfectly happy with other pages under the
>> same http://www.talisman.org/~erlkonig/
>
> I suppose the issue is related to http://www.talisman.org/robots.txt

Sorry, it seems that I was wrong about that - though I don't know 
whether the validator actually requests for robots.txt. The contents of 
robots.txt may reflect the site administration's intentions, but there a 
more specific mechanism in action.

It seems that the server www.talisman.org specifically handles a request 
from the W3C Validator in a specific way. Testing with the HTTP request 
and response analyzer
http://www.rexswain.com/httpview.html
using User-Agent: W3C_Validator
I get a response that consists of a 302 redirection to
http://dont-waste-bandwidth-running-validator-here/
(That's of course a rather questionable way of excluding things. A 
reasonable response would consist of some error code - not redirection - 
and an accompanying error page.)

So you need to contact the www.talisman.org server admin or to avoid the 
issue by putting your HTML documents in a folder where they won't get 
treated that way. I guess the "/img/" part in URL is the key; the server 
admin may think that such folders contain images only (the robots.txt 
contents is a hint of this

-- 
Yucca, http://www.cs.tut.fi/~jkorpela/
Received on Tuesday, 16 August 2011 19:42:55 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 25 April 2012 12:14:48 GMT