W3C home > Mailing lists > Public > www-validator@w3.org > July 2005

Re: Suggestion for Validator

From: olivier Thereaux <ot@w3.org>
Date: Tue, 19 Jul 2005 12:28:17 +0900
Message-Id: <295f182ffdc805aab5639df960756967@w3.org>
Cc: <www-validator@w3.org>
To: David Casey <david@studiocasey.com>

Hello David.

On Jul 16, 2005, at 10:24, David Casey wrote:
> I just wanted to give you a suggestion for the validator. I would like 
> to validate my entire site. Is there a way you could make it so I 
> could put in the domain and the validator would crawl through the 
> site?

Thanks for your suggestion. It is actually a recurring request from our 
users (note to self: should add an item to the FAQ). At the moment, as 
you noticed, the W3C Markup Validator does not provide such a feature, 
mostly because the task is a bit more complicated that it might appear.

A batch validation job looks rather simple from a very high 
perspective: crawl a whole site, then sequentially validate all the 
documents discovered.

Except of course that we are talking about a public service where 
theoretically anyone could ask for batch validation of any site, and if 
the crawling is done without any delay, requests will be made to the 
validated site at a pace that will probably not be dangerous and 
constitute a "denial of service", but it will certainly be enough to 
make some people unhappy if they did not initiate the request. Some 
webmasters can get very angry when they see several requests per second 
to their site from a given host, regardless of whether it's the W3C. 
Fast batch validation also means that the validator has to process a 
lot of resources very fast, which may not be very nice given that it's 
a free service shared by a lot, lot of users.

We had the exact same problem with the link checker, so we made it 
slower, waiting 1 second between requests, and following the robots 
exclusion protocol. And then comes the next hurdle: if you have more 
than a handful of documents to test on your website (and most people 
who want batch validation have much more, or they would just validate 
their handful of documents by hand), it means that validating them all 
will take probably a few minutes, if not more. And that is not 
compatible with a Web-based service, as many browsers just "give up" 
when a page has not finished loading after a given time, usually 30 or 
60 seconds.

So what are the solutions?

One first idea is to not force a delay between requests, and 
batch-validate quickly, but limit the number of requests. That's, for 
instance, what the WDG validator does (it limits itself to 100 URIs):
http://www.htmlhelp.com/tools/validator/

A second idea is to not use a web-based tool, and take however long it 
takes to crawl the site and validate everything. This is the approach 
used by the Log Validator, a W3C companion tool to the Markup 
Validator.
http://www.w3.org/QA/Tools/LogValidator/

I do not think the first approach would be acceptable for the W3C 
Markup Validator. Even with a limit (which in itself is an annoyance), 
the brute strain on a shared service and fast requests to a Web server 
with no guarantee that the owner actually requested it is too thorny. 
But I realize that not everyone wants to install the log validator, 
even if, quite frankly, if you know how to run a web server, installing 
and running the log validator should be easy.

Somewhere down my someday pile, there is the idea of making the a 
Web-based request service upon the log validator, with a queuing 
system, where people could add their site to the queue, an would later 
receive mail with their validation results. I haven't found the time to 
get around to it yet, but if anyone with a bit of perl knowledge wants 
to work on it with me, I would be very happy to develop this.

regards,
-- 
olivier
Received on Tuesday, 19 July 2005 03:28:29 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 25 April 2012 12:14:19 GMT