Re: Web SUBpages rejected with "Bad hostname" from C. Alex. North-Keys on 2011-08-19 (www-validator@w3.org from August 2011)

From: C. Alex. North-Keys <erlkonig@talisman.org>
Date: Thu, 18 Aug 2011 20:34:13 -0500
To: debbiem@companyv.com
CC: www-validator@w3.org, jkorpela@cs.tut.fi
Message-ID: <4E4DBD95.7060706@talisman.org>
This issue was solved as being .htaccess control which specifically 
blocked accessing files in a certain subdirectory by the Validator, 
using a RewriteRule.  I absolutely no idea why it was present.  Weird.  
Thanks for the useful tips that pointed the way to investigating the 
client side of the issue.

On 08/16/2011 02:50 PM, Debbie Mitchell wrote:
>
> When I validate: http://www.talisman.org/~erlkonig/img/
>
> I get:
>
> Validation Output: 1 Error
>     Line 2, Column 57: no system id specified<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN">
>
>
> Your document includes a DOCTYPE declaration with a public identifier (e.g. "-//W3C//DTD XHTML 1.0 Strict//EN") but no
> system identifier (e.g. "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"). This is authorized in HTML (based on
> SGML), but not in XML-based languages.
>
> If you are using a standard XHTML document type, it is recommended to use exactly one of the DOCTYPE declarations from
> the recommended list on the W3C QA Website.
>
>      Line 10, Column 1: Missing xmlns attribute for element html. The value should be: http://www.w3.org/1999/xhtml<html>
>
>
> Many Document Types based on XML need a mandatory xmlns attribute on the root element. For example, the root element for
> XHTML might look like:
>   <html xmlns="http://www.w3.org/1999/xhtml">
>
> ---------- Original Message -----------
>   From: "Jukka K. Korpela"<jkorpela@cs.tut.fi>
>   To: www-validator@w3.org, erlkonig@talisman.org
>   Sent: Tue, 16 Aug 2011 22:42:41 +0300
>   Subject: Re: Web SUBpages rejected with "Bad hostname"
>
>> 16.8.2011 16:40, I (Jukka K. Korpela) wrote:
>>
>>> 16.8.2011 12:08, C. Alex. North-Keys wrote:
>> [...]
>>>>> 1. I got the following unexpected response when trying to retrieve
>>>>> <http://www.talisman.org/~erlkonig/img/>:
>>>>> 500 Can't connect to dont-waste-bandwidth-running-validator-here:80
>>> [...]
>>>> Of course, the validator was perfectly happy with other pages under the
>>>> same http://www.talisman.org/~erlkonig/
>>> I suppose the issue is related to http://www.talisman.org/robots.txt
>> Sorry, it seems that I was wrong about that - though I don't know
>> whether the validator actually requests for robots.txt. The contents of
>> robots.txt may reflect the site administration's intentions, but there a
>> more specific mechanism in action.
>>
>> It seems that the server www.talisman.org specifically handles a request
>> from the W3C Validator in a specific way. Testing with the HTTP request
>> and response analyzer
>> http://www.rexswain.com/httpview.html
>> using User-Agent: W3C_Validator
>> I get a response that consists of a 302 redirection to
>> http://dont-waste-bandwidth-running-validator-here/
>> (That's of course a rather questionable way of excluding things. A
>> reasonable response would consist of some error code - not redirection -
>> and an accompanying error page.)
>>
>> So you need to contact the www.talisman.org server admin or to avoid the
>> issue by putting your HTML documents in a folder where they won't get
>> treated that way. I guess the "/img/" part in URL is the key; the server
>> admin may think that such folders contain images only (the robots.txt
>> contents is a hint of this
>>
>> -- 
>> Yucca, http://www.cs.tut.fi/~jkorpela/
> ------- End of Original Message -------
>
Received on Friday, 19 August 2011 02:43:52 UTC