W3C home > Mailing lists > Public > www-validator@w3.org > May 2020

RE: IO Error whenever page is named contact.html

From: Leonid Batkhan <leonid.batkhan@lenetek.com>
Date: Wed, 6 May 2020 15:14:19 -0400
To: "'Michael[tm] Smith'" <mike@w3.org>
Cc: <www-validator@w3.org>
Message-ID: <006e01d623da$8b7ee440$a27cacc0$@lenetek.com>
Thank you, Michael!


It seems you are on something regarding "contact" word.  "PHP backend" is somewhat more questionable.  These are some websites hosted in my account with Hostgator:


1. Both following pages do not have PHP:

https://www.wild-west-tours.com/ <https://www.wild-west-tours.com/contact.html> contact.html gives that 409 Error, but

https://www.wild-west-tours.com/ <https://www.wild-west-tours.com/contact_en.html> contact_en.html does not.


2. This page does have PHP reference:

https://www.lenetek.com/ <https://www.lenetek.com/contact-us.html> contact-us.html gives 409 Error (has "contact-us")


3. Both these pages do have PHP reference:

https://www.usa-travel.us/ <https://www.usa-travel.us/contact.html> contact.html gives 409 Error. but

https://www.usa-travel.us/ <https://www.usa-travel.us/contact_en.html> contact_en.html does not


It is possible that Hostgator web hosting provider has some IP black listed / filtered based on the name of retrieved page (containing “contact” or “contact-us” as shown in the examples above), but they will not tell me that, although I did ask them. I can’t make them whitelist any servers, it’s their internal policy, and who am I to influence that?!


However, given that all the above pages are accessible and retrievable via browsers and function just fine, why the validator can just ignore that 409 server response and still run validation report on the page. The validator should validate accessible and retrievable page based on its contents only and disregard that server code. That is my take on it.


Wouldn’t you agree?


Thank you.

Leonid Batkhan


-----Original Message-----
From: Michael[tm] Smith [mailto:mike@w3.org] 
Sent: Tuesday, May 5, 2020 7:26 AM
To: Leonid Batkhan <leonid.batkhan@lenetek.com>
Cc: www-validator@w3.org
Subject: Re: IO Error whenever page is named contact.html


OK, I looked into this and that short answer is that it’s a hosting issue with the  <https://www.usa-travel/> https://www.usa-travel/ site and with a number of other sites. And there is nothing we can do from the W3C side to fix it.


If you think this problem is affecting a site you run, what you can do is:

Tell your hosting provider to whitelist the subnet. That is the IP range for the W3C validator service.


The longer answer is that there appear to be a number of hosting providers or sites that are running some kind of blocking mechanism which checks the IP address of each request, and if (1) they find that the IP address is in some IP address is in some blocklist they use, and (2) the request URL has “contact” or “register” in the path, then the mechanism causes the server to respond with a 409 error.


The sites with this issue all seem be sites that are are running Wordpress or in some cases maybe not running Wordpress but just running a PHP backend.


And it’s possible that the mechanism behind this issue is the software system called “Wordfence”.


Regardless, whatever the system is that’s doing this, it appears to rely on checking some kind of distributed blocklist of IP addresse — and the W3C validator IP address range ended up in that blocklist.


So, as I mentioned above, if you think your site is affected by this, then ask your hosting provider to un-block the subnet, or ask them to get the subnet removed from whatever distributed blocklists they’re using — or else ask them to quit using altogether whatever they find the subnet IP addresses in.


Whatever blocklists exists that have IP addresses in them are bad, broken, poorly-administered blocklists that nobody should be relying on. There is nothing originating from those (W3C) addresses that even remotely could be considered abuse — nothing that would merit those IP addresses ending up in the blocklist.


And if W3C server IP addresses are in a blocklist mistakenly, it is very likely that quite a few other legitimate IP addresses are mistakenly in that same blocklist. And the effect of that would be that you have users/ customers who aren’t able to access any pages at your site which have “contact” or “register” in the page filenames/paths.


Leonid Batkhan < <mailto:leonid.batkhan@lenetek.com> leonid.batkhan@lenetek.com>, 2020-05-04 18:27 -0400:

> Archived-At: 

> < <https://www.w3.org/mid/002301d62263$46945040$d3bcf0c0$@lenetek.com> https://www.w3.org/mid/002301d62263$46945040$d3bcf0c0$@lenetek.com>


> I tried validating several websites, and noticed that whenever page is 

> called contact.html I am getting the following


> 1.      IO Error: HTTP resource not retrievable. The HTTP status from the

> remote server was: 409.


>  <https://www.usa-travel.us/contact.html> https://www.usa-travel.us/contact.html


> Here is a screenshot:




> I checked that on several websites with consistent results which only 

> affects pages named contact.html.  Even when I renamed perfectly 

> validated page to be named contact.html that page stops being validated.


> Could you please let me know what is going on?


> Thank you in advance.


> Leonid Batkhan











Michael[tm] Smith  <https://people.w3.org/mike> https://people.w3.org/mike

This email has been checked for viruses by Avast antivirus software.
Received on Wednesday, 6 May 2020 19:14:36 UTC

This archive was generated by hypermail 2.4.0 : Wednesday, 6 May 2020 19:14:36 UTC