- From: olivier Thereaux <ot@w3.org>
- Date: Tue, 19 Jul 2005 12:28:17 +0900
- To: David Casey <david@studiocasey.com>
- Cc: <www-validator@w3.org>
Hello David. On Jul 16, 2005, at 10:24, David Casey wrote: > I just wanted to give you a suggestion for the validator. I would like > to validate my entire site. Is there a way you could make it so I > could put in the domain and the validator would crawl through the > site? Thanks for your suggestion. It is actually a recurring request from our users (note to self: should add an item to the FAQ). At the moment, as you noticed, the W3C Markup Validator does not provide such a feature, mostly because the task is a bit more complicated that it might appear. A batch validation job looks rather simple from a very high perspective: crawl a whole site, then sequentially validate all the documents discovered. Except of course that we are talking about a public service where theoretically anyone could ask for batch validation of any site, and if the crawling is done without any delay, requests will be made to the validated site at a pace that will probably not be dangerous and constitute a "denial of service", but it will certainly be enough to make some people unhappy if they did not initiate the request. Some webmasters can get very angry when they see several requests per second to their site from a given host, regardless of whether it's the W3C. Fast batch validation also means that the validator has to process a lot of resources very fast, which may not be very nice given that it's a free service shared by a lot, lot of users. We had the exact same problem with the link checker, so we made it slower, waiting 1 second between requests, and following the robots exclusion protocol. And then comes the next hurdle: if you have more than a handful of documents to test on your website (and most people who want batch validation have much more, or they would just validate their handful of documents by hand), it means that validating them all will take probably a few minutes, if not more. And that is not compatible with a Web-based service, as many browsers just "give up" when a page has not finished loading after a given time, usually 30 or 60 seconds. So what are the solutions? One first idea is to not force a delay between requests, and batch-validate quickly, but limit the number of requests. That's, for instance, what the WDG validator does (it limits itself to 100 URIs): http://www.htmlhelp.com/tools/validator/ A second idea is to not use a web-based tool, and take however long it takes to crawl the site and validate everything. This is the approach used by the Log Validator, a W3C companion tool to the Markup Validator. http://www.w3.org/QA/Tools/LogValidator/ I do not think the first approach would be acceptable for the W3C Markup Validator. Even with a limit (which in itself is an annoyance), the brute strain on a shared service and fast requests to a Web server with no guarantee that the owner actually requested it is too thorny. But I realize that not everyone wants to install the log validator, even if, quite frankly, if you know how to run a web server, installing and running the log validator should be easy. Somewhere down my someday pile, there is the idea of making the a Web-based request service upon the log validator, with a queuing system, where people could add their site to the queue, an would later receive mail with their validation results. I haven't found the time to get around to it yet, but if anyone with a bit of perl knowledge wants to work on it with me, I would be very happy to develop this. regards, -- olivier
Received on Tuesday, 19 July 2005 03:28:29 UTC