Re: Validator availability

Hi Jens,

Jens Oliver Meiert <jens@meiert.com>, 2015-04-05 20:51 +0200:
> Archived-At: <http://www.w3.org/mid/CAJ0g8QROi-9-M8n10X_HdfCwev8aMDwvb+R05p2X07n2tCFZ7g@mail.gmail.com>
> 
> I repeatedly run into issues with validator.w3.org either returning
> “excessive traffic pattern blocked” (via a commercial VPN) or nothing
> at all (via Tor).
> 
> Is this happening elsewhere, or can the W3C team comment?

As far as the problem of the validator repeatedly not returning anything at
all, I can’t recall ever seeing any other reports of that—nor any other
reports of problems access it through Tor. But that could just be because
not many people have tried using it through Tor.

Does it happen every time you try to you make a validation request to
validator.w3.org through Tor? Or only sometimes? Does it happen if you use
http://validator.w3.org/nu/ instead? Or if you use the CSS validator?

We may be able to troubleshoot it by having you make a validation request for
a particular URL and checking the validator logs to see if anything unexpected
is getting logged at the times when you’re seeing nothing returned.

As far as the “excessive traffic pattern blocked” problem the W3C systems
team does get reports of that regularly, and every single time they
investigate it they find that in fact it is because a particular IP address
or range is actually sending excessive traffic to the validator.

The short answer to how to solve that problem is: Run a local copy of the
Nu HTML Checker instead—either just using the vnu.jar executable directly
from the command line, or by using it to run your own Web-based persistent
instance of the checker—

  http://validator.github.io/validator/
  http://validator.github.io/validator/#web-based-checking

The longer answer is:

Any traffic from a particular IP address or range that’s more than a
certain maximum number of allowed requests per minute is considered
excessive. The maximum is set high enough that it’s not something you’re
ever going to it hit if you’re just checking documents using the Web-based
form frontend and manually entering a URL for a document to check, or a
file to upload for checking.

But the common case where people have run into it in the past has been when
they’ve installed a browser plugin that automatically sends a request to
the validator for every single page they visit—or when somebody else in
their local network has such a browser plugin installed.

Otherwise the only other case where I think you’d ever hit the limit is if
you’re running a script or some other custom application that’s capable of
sending a large number of requests to the validator in a short amount of
time. I’ve gotten blocked myself just when running a simple shell script
that recursively finds all HTML files in a particular directory and then
uses curl to make a validator request for each HTML file it finds.

If you’re running a script or app like that and it’s processing a lot of
files at once, you’re going to get blocked and there’s no way to avoid it—
because it is actually generating an excessive amount of traffic as far as
the W3C rate-limiting metrics are concerned.

But the solution to that problem is as I mentioned above pretty
straightforward: Just grab the latest vnu.jar release and run that locally.

  —Mike

-- 
Michael[tm] Smith https://people.w3.org/mike

Received on Monday, 6 April 2015 03:06:56 UTC